A field guide for the business

The physics
of waiting.

Why lines form, why your customers give up, and why the same number of agents can deliver brilliant service one day and a meltdown the next. No maths degree required.


01
Why it matters

Waiting is a property of the system, not a measure of effort.

Here's the uncomfortable truth that queueing theory hands every operations leader: your customers' wait times are mostly decided before the day even starts by how you've arranged things. Not by how hard people work on the phones.

Two facts drive almost everything that follows, and both cost real money:

Long waits don't just annoy. They walk out the door.

Every customer who gives up while waiting is a lost sale, a complaint, or a callback that costs you twice. Abandonment is the most expensive number most centres barely look at.

The same team can be fast or slow depending on how you organise it.

Take a fixed group of agents. Slice them up one way and customers wait seconds. Slice them another way, with the exact same headcount, and the queue collapses. The structure is the lever. That's the whole game.

Get the structure right and you buy shorter waits, fewer walkaways, and calmer staff without hiring a single extra person. Get it wrong and no amount of effort rescues it. The good news is that the rules are simple and they're below.


02
What it actually is

Three ingredients, and the numbers they cook up.

Strip away the jargon and a queue is just three things meeting each other: work that arrives, how long each job takes, and how many people can do it. Everything customers feel, the waiting, the giving up, falls out of how those three interact.

Queueing theory is simply the study of that interaction. The metrics below are what it produces. The business name is the one your board cares about. The technical name is the one your WFM and analytics teams use. They're the same thing wearing different clothes.

Demand
arrival rate
How many contacts show up, and how bursty they are. Rarely steady. The lumpiness is what hurts you.
Handle time
service time
How long one contact ties up one agent. Talk time plus the wrap-up afterwards.
Agents
servers
How many people are eligible to take a given contact. Eligible is the word that matters most.
How busy they are
occupancy / utilisation
The share of agent time actually spent on contacts. The single most dangerous number on the page.
Speed of answer
average wait
How long the average customer sits in the queue. What they experience as "good" or "rubbish".
Walkaways
abandonment
The share who give up before reaching anyone. Rises sharply the longer the wait.
Service level
SL / grade of service
The promise, e.g. "80% answered in 30 seconds". A headline target, not the full story.
Work in the system
queue length
How many contacts are waiting or in progress right now. The visible backlog.

03
The one chart to remember

Busy is good. Too busy is a cliff.

You'd think pushing agents to be busier just makes things gradually a bit slower. It doesn't. Waiting stays low and flat while agents are moderately busy, then near the top it goes vertical. The last sliver of "efficiency" buys you an avalanche of waiting.

Drag the volume below. Watch the occupancy creep up calmly while the wait sits quiet, then explode. This is why planners aim to keep agents around 80–85% busy, never 100%. That headroom isn't waste. It's what keeps the queue from falling off the cliff.

Interactive · the utilisation cliff

One team of 20 agents

Five-minute average handle time. Slide the incoming call volume up and down.

How busy
occupancy
Avg wait
speed of answer
Answered in 30s
service level
Walkaways
est. abandon
180/hr
What you're seeing. Below ~80% busy, customers barely wait. Push past ~90% and the wait runs away from you. Same 20 agents the whole time. Nothing changed except how full the pipe got.

04
Who's in the picture

Four players, one shared fate.

Customers

They have a fixed patience. Cross it and they're gone, often for good. Their wait is the number that defines your reputation.

Agents

The servers. What matters isn't only how many you have, but how many are allowed to take any given contact.

Planners & WFM

They forecast demand and staff against it. Bigger, simpler pools are far easier to forecast accurately than many tiny ones.

The business

Pays for the agents, lives with the walkaways. Queue structure is one of the few levers that moves both cost and experience at once.


05
Where you steer it

Three places you decide how the queue behaves.

You don't manage waiting on the phones. You design it upstream, in three decisions that most centres make almost by accident. Each one quietly sets how big your pools are, and pool size is everything (you'll see exactly why in the next section).

The IVR menu (call steering)

Every menu branch you add chops your demand into a thinner stream that then needs its own little pool of agents. "Press 1 for this, 2 for that, 3 for the other" feels helpful, but each press is a tax on pool size. Over-segmenting the front door is the most common self-inflicted wound in the building.

→ Identify intent fast, route once, and steer to the fewest, broadest sensible destinations.

Queue design

Fewer, larger queues answer faster than many small ones at the exact same staffing. Splitting a queue so your reports are clearer is fine and often smart. Splitting the agent pool behind it is what quietly costs you waits. Keep those two ideas separate in your head.

→ Report granularly if you like, but pool the people generously.

Agent competency (skills)

This is the lever with the most leverage. An agent's "skills" decide which contacts they're allowed to take. Narrow, specialised skills create lots of tiny pools, and in a tiny pool a customer can be waiting while qualified-but-untrained agents sit idle ten feet away. Broaden the skills and those idle agents suddenly become available. Same people, far less waiting.

→ Use the fewest skill groups that still keep each pool comfortably large.


06
How & when it bites

Pool generously. Fragmenting is a tax you pay in waiting.

Here's the principle that surprises everyone: a few big pools serve people dramatically faster than many small pools, even with the same total headcount. Think of a bank with one snaking queue feeding ten tellers versus ten separate lines. The snake wins every time, because one slow customer can't trap you behind the wrong wall.

And it bites hardest exactly when things get busy or bursty. Big pools absorb a sudden rush; small pools choke on it. So the moment you can least afford waiting is the moment fragmentation punishes you most.

Below: take one team of 48 agents and the same total call volume. Now chop them into more and more separate skill groups. Watch what happens to the wait and the walkaways. Nothing else changes. Only the number of walls between idle agents and waiting customers.

Interactive · the cost of fragmenting

48 agents, one workload, sliced into pieces

Held at a sensible ~80% busy overall. The only thing you change is how many separate groups the team is split into.

Skill groups
— agents each
Avg wait
speed of answer
Answered in 30s
service level
Walkaways
est. abandon
1 group
The catch worth saying out loud. Broadening skills to keep pools big is powerful, but it isn't free. Push it too far and agents become generalists who handle each contact a little slower and resolve fewer tricky cases first time. The art is the smallest number of skill groups that keeps pools large. When you consolidate, watch handle time and first-contact resolution, not just the wait. Buying speed by quietly losing quality is a trade, not a win.

07
When queues share agents

Overlapping queues, and who gets to go first.

Real centres rarely have one queue per team. More often, several queues all point at the same group of agents. One team quietly serves sales, billing, and complaints all at once. That's overlapping queues, and it's pooling done on purpose: the capacity is shared, so the waits drop. The catch is that those queues now compete for the same people.

Billing queue VIP queue Sales queue One shared agent pool

Priority is how you settle the competition. Hand one queue priority and its callers jump ahead of the others every time an agent frees up. That's exactly what you want for a VIP line or a vulnerable-customer queue. But it has a sharp edge, and most people don't see it coming:

High priority for one queue means starvation risk for another.

When the centre gets busy, a priority stream keeps cutting to the front, so the lower-priority queue waits longer and longer. Push it far enough and the low queue effectively stops moving. Priority doesn't create capacity. It just decides who suffers the shortage.

The elegant middle path is ring (or "bullseye") routing: start a contact aimed at the smallest, best-matched group of agents, and widen the eligible pool the longer they wait. Specialists when there's slack, the whole pool when there isn't. You get expertise and pooling, and you decide the trade with a timer instead of a wall.

All of this is easier to feel than to read about. So let's actually run it.


08
Watch it run

A live queue, one event at a time.

This is a proper discrete-event simulation, the same approach the open-source queueing tools use: calls arrive at random, grab a free agent if one's eligible, get served for a random handle time, and give up if they wait past their patience. Nothing here is averaged or pre-baked. It's individual calls and individual agents, playing out in front of you.

Eight agents. Two kinds of call: everyday Standard and impatient VIP. Try the experiments below the stage.

Interactive · live discrete-event simulation

One contact centre, running in real time

Press play. Then push the volume up and watch the queue build. Flip priority on and off. Switch between one shared pool and dedicated splits.

Your current setup - this updates as you change the controls
How busy
live occupancy
VIP wait
avg, answered
Standard wait
avg, answered
Walkaways
gave up waiting
Agents
Waiting in queue
Standard call VIP call idle agent
2.0/min
Three experiments worth running. One: drag the volume past about 2.7/min, where the eight agents can't keep up, and watch walkaways climb as the queue stops draining. Two: at high volume, toggle VIP priority off and on. With it on, VIP waits stay short while Standard waits balloon. That's protection for one queue paid for by the other. Three: switch to "Dedicated split" and compare. Carving the same eight agents into a VIP pool and a Standard pool leaves VIP agents idle while Standard callers pile up. The shared pool almost always serves more people, faster.

09
Read the tape

The same run, replayed as a moving chart.

The agent stage above shows you the now. This shows you the story over time, the way a trading screen does. While the simulation up there is playing, these charts record every metric live. Save a run, change the settings, save another, and see them all from t=0 on the same chart.

Interactive · the live tape

Waits and walkaways, second by second

Press play on the simulation above first. Save a run when you're happy, then change mode or priority and run another. Saved runs overlay from t=0 so you can compare directly.

VIP wait Standard wait Walkaways (right axis) — saved runs overlay from t=0 in their own colour
Scrub through time
Session comparison - average so far under each structure
MetricShared poolDedicated split
VIP wait
Standard wait
Walkaways
Interactive · capacity vs demand

How busy the team is, against the calls coming in

When the green line pins near the top, you've run out of headroom - and that's exactly when the waits above take off. Same timeline, same scrubber.

How busy the agents are (left axis) Calls arriving (right axis)
Scrub through time
The simulator uses exponential handle times and patience, which is the textbook assumption and fine for teaching the concepts, but real-world handle times are lumpier. For your environment, you'd calibrate against your own data.

10
The pocket version

Six rules of thumb for the next planning meeting.

  1. Aim for ~80–85% busy, never 100%. The last slice of utilisation buys you exponential pain. Headroom is the price of a calm queue.
  2. Merge before you split. When in doubt, pool. Bigger pools are faster, steadier, and easier to forecast.
  3. Treat every IVR branch as a cost. Each one shrinks a pool. Make every menu option earn its place.
  4. Separate "report" from "route". You can slice the numbers finely without slicing the people. Granular reporting, generous pooling.
  5. Skill breadth is pool size. The width of your agents' skills is the most powerful wait-time lever you own. Use it deliberately.
  6. When you consolidate skills, watch quality. If waits drop but handle time and resolution hold, you got a genuine win. If quality slips, you made a trade, and you should know its size.
For the engine room

The technical translation.

Everything above is the same theory your analysts, WFM team and platform engineers use. Here's the bridge so the business and the engine room are pointing at the same things. Hand this section to whoever builds the routing.

Business termTechnical termSymbol
Demand / calls arrivingArrival rate (Poisson)λ
Handle timeMean service time1 / μ
Agents / serversChannelsc
Offered demandOffered load (Erlangs)A = λ / μ
How busy agents areUtilisation / traffic intensityρ = A / c
Chance a customer waitsProbability of delay (Erlang C)Pw
Average wait / speed of answerMean waiting time in queueWq
% answered in targetService level at time TSL(T)
Work in the systemMean number in systemL
WalkawaysAbandonment (Erlang A / patience)θ
Little's Law
L = λ × W
The backlog you see equals how fast work arrives times how long it stays. Universal, assumption-free, and the sanity check behind every capacity model.
Utilisation
ρ = λ / (c·μ) = A / c
The "how busy" gauge. As ρ → 1, queue length and wait → ∞. This is the cliff in chart 03, in one symbol.
Erlang C - probability of delay
Pw = Erlang C(c, A)
The workhorse staffing model. Computed via the stable Erlang-B recursion, then converted. Mean wait Wq = Pw · (1/μ) / (c − A).
Discrete-event simulation
arrive → seize → serve → renege
The live sim in section 08 is an event-scheduling DES, the same model behind SimPy, Ciw and SIM.JS. Poisson arrivals, exponential service and patience, priority classes, and pooled vs dedicated servers. When the maths gets too gnarly to solve on paper (overlap, priority, abandonment together), you simulate.

A fair-warning footnote. The classic Erlang C model assumes nobody hangs up. Real customers do. Once you add patience, the proper model is Erlang A (the abandonment column above), which predicts shorter waits than Erlang C because impatient callers leave and relieve the queue. The walkaway figures in the interactives use a simple patience approximation for illustration. For production staffing, model abandonment explicitly. Treat the numbers here as the shape of the truth, not a quote to staff against.