Skip to main content

// Section 4.2 · Protocol

Capacity-Aware Queueing

1 min4.2Protocol
// 2 of 4 · request routing

Priority by complexity, not arrival order.

// 4.2 · capacity-aware queueing · signal 2 of 4

Higher-complexity requests are not blocked behind lower-complexity ones.

After capability filtering, the surviving node pool has a queue. Requests enter that queue ordered by complexity, not by arrival order. On the live AI path each entry is a whole inference request, dispatched intact to one node; on the general parallelizable-workload path it is a sub-task. Each node holds its own active capacity counter. The scheduler does not over-subscribe a node beyond its declared limit.

Per-node capacity counter

// each node carries its own limit · scheduler respects declared maximum

nd_4f7c
3 / 4ACCEPTING
nd_8e91
4 / 4SATURATED
nd_c027
2 / 8ACCEPTING
nd_d104
1 / 4ACCEPTING

// capacity counter is per-node · the scheduler skips nd_8e91 entirely while it sits at the max

Priority queue order

// complexity-ranked · not FIFO

// ORDERCOMPLEXITYARRIVALDISPATCH ETA
// 1
req_a3f1large-context inference request
14:03:12next
// 2
sub_b2c0mid render frame
14:03:05next-of-tier
// 3
sub_c1d8small ETL partition
14:03:08queued
// 4
req_d4e2short inference request
14:03:11queued
// 5
sub_e5f9small data window
14:03:14queued

// the heaviest request arrived LAST but dispatches FIRST · capacity-aware queueing prevents head-of-line blocking on lower-complexity work