Load Balancing¶
Intent¶
Load Balancing preserves throughput, responsiveness, fairness, or resilience by distributing incoming flow or work across multiple viable receivers, paths, resources, or agents according to capacity, health, priority, or policy.
The archetype is useful when one part of a system is overloaded while other usable capacity remains available. Instead of reducing total demand, holding it in a buffer, or switching only after a failure, Load Balancing changes assignment: it sends work where it can be handled.
In compact form:
When uneven assignment creates localized overload despite available capacity elsewhere, distribute flow across viable receivers to improve utilization and preserve service.
Primes¶
Composed of: Routing Rule, Capacity Signal, Resource Pooling, Admission Control, Distribution Policy, Health Check, Feedback, Monitoring
Related primes: Flow, Network, Coupling, Resource Management, Constraint, Queueing, Observability, Scalability, Resilience, Robustness, Scheduling, Trade-offs, Fairness
Structural Signature¶
This archetype is a strong candidate when the following conditions co-occur:
- A flow of work, requests, traffic, demand, patients, tasks, energy, goods, or attention must be assigned somewhere.
- There are multiple possible receivers, paths, workers, servers, queues, locations, routes, teams, or capacity pools.
- The receivers are not all equally loaded, healthy, local, capable, or appropriate at every moment.
- Uneven assignment can create localized overload, latency, fragility, underutilization, or unfairness.
- The system can route, assign, shift, or redistribute the flow.
- The system can observe or estimate enough about capacity, load, health, priority, or suitability to make assignment decisions.
Load Balancing is especially relevant when total capacity may be sufficient, but poor distribution makes the system behave as if capacity were scarce.
Intervention Signature¶
Route, assign, or distribute flow across viable capacity according to a balancing policy informed by capacity, health, load, priority, locality, or fairness signals.
The intervention changes the system from:
to:
The key move is not merely splitting evenly. A mature load balancing design asks which receivers are viable, how much each can absorb, what kind of work each should receive, and how the assignment policy changes when conditions change.
Causal Logic¶
Localized overload can happen even when aggregate capacity is adequate. One server receives too much traffic while others sit idle. One team absorbs most requests because it is most visible. One hospital, queue, warehouse, route, worker group, or power line becomes a hot spot while alternatives remain underused.
Load Balancing works by changing assignment topology.
- Available capacity is made visible. The system observes or estimates receiver health, queue depth, utilization, response time, location, skill, or availability.
- A distribution policy selects destinations. Work is assigned by round-robin, weight, least-loaded state, priority, locality, fairness, affinity, cost, or suitability.
- Flow shifts away from hot spots. Overloaded or unhealthy receivers receive less work.
- Idle capacity is used. Receivers that can absorb more work receive more of it.
- Localized overload declines. The system avoids failure caused by poor assignment rather than true lack of total capacity.
- The policy adapts as conditions change. Feedback or monitoring updates assignment decisions.
The archetype converts concentrated load into distributed load.
What It Is Not¶
Load Balancing is not Failover. Failover transfers function or flow to an alternate path when a primary path fails or degrades. Load Balancing distributes ordinary or live load across multiple viable receivers, often before any receiver has fully failed.
Load Balancing is not Rate Limiting. Rate Limiting caps how much or how often flow is admitted. Load Balancing assigns admitted flow across available capacity.
Load Balancing is not Backpressure. Backpressure communicates downstream capacity pressure upstream so production slows. Load Balancing changes which downstream receiver gets the work.
Load Balancing is not Buffering. Buffering holds flow temporarily to absorb timing mismatch. Load Balancing redirects or assigns flow to available capacity.
Load Balancing is not Bulkhead Isolation. Bulkhead Isolation partitions capacity to contain failure. Load Balancing often uses pooled capacity, though it may operate within or across bulkheads.
Load Balancing is not Flow Diversion / Rerouting. Rerouting diverts flow away from a blocked, harmful, overloaded, or undesirable path. Load Balancing distributes flow across multiple viable receivers or paths to improve utilization and reduce localized overload.
Load Balancing is not equal splitting. Equal distribution can be bad balancing when receivers differ in capacity, health, task suitability, locality, or priority.
Composition¶
Load Balancing is composed from several lower-level abstractions:
- Flow — Something must be routed, assigned, distributed, or allocated.
- Network / topology — There must be multiple possible paths, receivers, or capacity nodes.
- Resource management — The intervention allocates work across capacity.
- Capacity signal — The system benefits from knowing receiver load, health, or availability.
- Routing rule — Work must be assigned according to a policy.
- Health check — Receivers that are unhealthy, saturated, or unavailable should be avoided or downweighted.
- Feedback — Assignment quality improves when current outcomes influence future routing.
- Monitoring — Distribution effects, hot spots, and failure modes must be visible.
The composition matters. Without multiple viable receivers, there is nothing to balance. Without capacity or health signals, balancing may be blind. Without routing control, observation does not change assignment. Without invariants around state or affinity, redistributed work may break correctness.
Mechanism Families¶
Common mechanism families include:
- Software load balancers — Requests are distributed across servers, services, or instances.
- DNS or network traffic distribution — Traffic is routed across endpoints, regions, or paths.
- Queue worker assignment — Jobs are distributed across workers according to availability or queue depth.
- Compute job scheduling across workers — Workloads are assigned across machines, clusters, or processors.
- Call center or ticket routing — Requests are distributed among agents or teams based on availability, skill, or priority.
- Logistics route and warehouse assignment — Orders, deliveries, or inventory flows are distributed across warehouses, routes, or carriers.
- Power load distribution — Electrical load is balanced across generation, transmission, or distribution capacity.
- Healthcare or public-service capacity routing — Patients, cases, or requests are routed across facilities or service points.
- Human workload assignment — Tasks are distributed across people or teams to prevent localized overload.
- Platform market matching across capacity — Demand is routed across available suppliers, workers, drivers, hosts, or providers.
These mechanisms differ by domain, but they preserve the same intervention logic: assign flow across viable capacity to reduce hot spots and improve utilization.
Parameter Dimensions¶
Concrete mechanisms usually require tuning along dimensions such as:
- Balancing algorithm — Round-robin, least-loaded, weighted, priority-aware, random, locality-aware, cost-aware, or adaptive?
- Capacity weight — How much flow should each receiver receive relative to capacity?
- Health-check cadence — How often is receiver health measured?
- Routing granularity — Is work assigned per request, batch, session, case, route, or time window?
- Stickiness or affinity rule — Must certain work remain with the same receiver?
- Overload threshold — When is a receiver considered too loaded?
- Receiver weight — How much relative preference does each receiver get?
- Failout threshold — When is a receiver removed from rotation?
- Rebalancing cadence — How often are assignments redistributed?
- Locality preference — Should work prefer nearby or contextually close receivers?
- Priority class rule — Do some flows receive special assignment?
- Fairness constraint — How evenly should work or burden be distributed?
- Maximum assignment skew — How unequal may the distribution become?
- Backoff or cooldown after failure — How long before a failed receiver re-enters?
These are parameter dimensions, not the archetype itself.
Invariants to Preserve¶
Load Balancing should preserve explicit invariants:
- Work is sent only to viable receivers — Unhealthy or incapable receivers should not continue receiving ordinary work.
- Assignment policy is defined — The system should know why work goes where it goes.
- Capacity or health signal is not ignored — Balancing should respond to meaningful receiver conditions.
- Stateful work preserves required affinity or consistency — Redistribution should not break sessions, transactions, ownership, or context.
- No receiver is overloaded while equivalent capacity remains idle — Avoidable hot spots should be corrected.
- Failed or unhealthy receivers are removed or downweighted — The policy should not treat all receivers as equal when they are not.
- The balancing layer does not silently drop work — Assignment failure should be explicit.
- Critical work preserves priority or safety requirements — Distribution should not undermine priority policies.
If these invariants cannot be preserved, load balancing may create instability, inconsistency, or hidden failure.
Tradeoffs¶
Load Balancing accepts routing and coordination costs in order to improve capacity use and resilience.
Typical tradeoffs include:
- Routing complexity increases because work must be assigned intelligently.
- Coordination overhead rises because receiver state must be tracked.
- State affinity may constrain distribution when work cannot safely move freely.
- Receiver quality may vary and balancing may send work to less capable nodes.
- Monitoring burden increases because load, health, and outcomes must be observed.
- Fairness and efficiency may conflict when the most efficient assignment creates unequal burden.
- Overcorrection can occur if the system shifts too much work away from a perceived hot spot.
- The balancer can become a bottleneck or failure point if not designed carefully.
The archetype is therefore not merely “spread things out.” It is a governed assignment strategy under capacity and reliability constraints.
Contraindications¶
Load Balancing is a poor fit when work cannot be safely assigned across multiple receivers.
Use cautiously or avoid when:
- only one viable receiver exists,
- flow cannot be rerouted or reassigned,
- state affinity prevents safe redistribution,
- capacity or health signals are unavailable or misleading,
- distribution overhead exceeds the overload benefit,
- all receivers share the same binding bottleneck,
- fairness policy requires a different allocation rule,
- the balancing layer would create a more fragile central dependency,
- the real problem is excess total demand rather than uneven distribution,
- redistribution would violate safety, ownership, locality, legal, or ethical constraints.
In such cases, rate limiting, backpressure, buffering, failover, load shedding, capacity expansion, bulkhead isolation, or redesign may be more appropriate.
Failure Modes¶
Common failure modes include:
- Hot spot creation — The balancing policy unintentionally concentrates load.
- Stale capacity signal — Assignments are based on outdated receiver state.
- Unhealthy receiver selection — Work continues flowing to failed or degraded receivers.
- State affinity violation — Work is routed to a receiver that lacks needed context or ownership.
- Thrashing or route flapping — Work shifts too rapidly between receivers.
- Central balancer bottleneck — The balancer itself becomes overloaded or critical.
- Fairness collapse — Some receivers, workers, regions, or users bear disproportionate load.
- Hidden shared bottleneck — Balanced receivers still depend on the same constrained backend.
- Capacity fragmentation — Small pools cannot help each other effectively.
- Overload migration — The hot spot moves rather than disappearing.
- Sticky session pathology — Affinity rules prevent useful redistribution.
- Feedback lag — The system reacts after overload has already accumulated.
- Treating equal split as balance — Equal distribution ignores real differences in capacity or suitability.
These failure modes should be treated as part of the archetype's design space.
Worked Example¶
A web application runs six server instances behind a single public endpoint. Initially, traffic is sent to the first available instance in a static order. During a traffic spike, two instances receive most of the requests because of session stickiness and stale routing rules. Their latency rises sharply while other instances remain underutilized.
The team implements Load Balancing.
- A balancing layer observes server health and response time.
- Requests are assigned using weighted least-connections rather than static ordering.
- Unhealthy instances are removed from rotation.
- Session affinity is preserved only when required.
- High-priority requests can use a protected routing class.
- The team monitors load skew, latency, and failed assignment rate.
The intervention does not reduce total demand. It changes assignment. The application handles the spike better because work is distributed across available capacity instead of concentrating on a few overloaded instances.
The key move is not simply adding more servers. It is routing flow according to capacity and health so available capacity is actually used.
Cross-Domain Instances¶
- Software and web services — Requests are distributed across servers, instances, regions, or services to reduce hot spots and improve availability.
- Networking and traffic routing — Network traffic is distributed across links, routes, or endpoints based on capacity, latency, policy, or health.
- Compute and job scheduling — Jobs are assigned across processors, machines, clusters, or workers according to capacity and suitability.
- Customer support and operations — Tickets or calls are routed to agents or teams based on availability, skill, priority, or current load.
- Logistics and supply chains — Orders, shipments, inventory, or deliveries are assigned across warehouses, routes, carriers, or fulfillment centers.
- Power and energy systems — Electrical load is distributed across generators, lines, storage, or demand-side resources to preserve stability.
- Healthcare or public-service intake — Patients, cases, or requests are routed across facilities, desks, or teams to reduce local overload.
- Organizational work assignment — Tasks are distributed across people or teams to avoid localized burnout or bottlenecks.
- Platform market matching — Demand is matched across available providers, drivers, hosts, workers, or sellers to improve utilization and responsiveness.
These examples are structurally related because each routes or assigns flow across multiple viable receivers to reduce localized overload and improve use of available capacity.
Notes¶
Load Balancing should be reviewed alongside Flow Diversion / Rerouting, Failover, Backpressure, Rate Limiting, Buffering, Bulkhead Isolation, Scheduling, Queueing, and Resource Allocation.
The main conceptual risk is collapse into nearby concepts:
- If the entry emphasizes switching after primary failure, it becomes Failover.
- If the entry emphasizes diverting away from a specific blocked or harmful path, it becomes Flow Diversion / Rerouting.
- If the entry emphasizes capping flow, it becomes Rate Limiting.
- If the entry emphasizes upstream slowing, it becomes Backpressure.
- If the entry emphasizes temporary holding, it becomes Buffering.
- If the entry emphasizes partitioning failure domains, it becomes Bulkhead Isolation.
- If the entry merely orders work in time, it may become Scheduling or Queueing.
The current entry uses routing_rule, distribution_policy, resource_pooling, and capacity_signal as solution-side labels. These may need later normalization as lower-level archetypal components, prime abstractions, mechanisms, or informal component labels.