From f8370941faf26317e478b5d3b5a61c7557f3eddc Mon Sep 17 00:00:00 2001 From: Cassandra Sziklai <142637613+csziklai@users.noreply.github.com> Date: Fri, 9 Aug 2024 12:37:22 -0400 Subject: [PATCH] Update documentation: New style PIFOs, RR, and Strict (#2237) This PR makes progress towards https://github.com/calyxir/calyx/issues/2191 by adding documentation about round robin and strict queues. Additionally, it completes the last checkbox of https://github.com/calyxir/calyx/issues/2226 by updating the documentation to use the new style of PIFOs. Recently, it also updates the documentation to reflect the removal of peek (#2241). Because of a git mistake carried forward, this PR supersedes #2223. --------- Co-authored-by: Anshuman Mohan --- docs/frontends/queues.md | 86 +++++++++++++++++++++++++++++----------- 1 file changed, 62 insertions(+), 24 deletions(-) diff --git a/docs/frontends/queues.md b/docs/frontends/queues.md index 6e3bbab5cd..9c2fe769f8 100644 --- a/docs/frontends/queues.md +++ b/docs/frontends/queues.md @@ -3,8 +3,7 @@ A queue is a standard data structure that maintains a set of elements in some total order. A new element is added to the queue using the `enqueue` operation, which is also known as `push` or `insert` in some contexts. Because of the total order, some element of the queue is the _most favorably ranked_ element. -We can read this element using the `peek` operation. -We can also remove this element from the queue using the `dequeue` operation, which is also known as `pop` or `remove` in some contexts. +We can remove this element from the queue using the `dequeue` operation, which is also known as `pop` or `remove` in some contexts. We provision four types of queues in Calyx. The first three follow the same shared interface, while the fourth follows a slightly extended interface. The frontend is implemented using the [Calyx builder library][builder], and the source code is heavily commented. @@ -17,12 +16,11 @@ All queues in Calyx expose the same interface. - Input port `cmd`, a 2-bit integer. Selects the operation to perform: - `0`: `pop`. - - `1`: `peek`. - - `2`: `push`. + - `1`: `push`. - Input port `value`, a 32-bit integer. The value to push. - Register `ans`, a 32-bit integer that is passed to the queue by reference. -If `peek` or `pop` is selected, the queue writes the result to this register. +If `pop` is selected, the queue writes the result to this register. - Register `err`, a 1-bit integer that is passed to the queue by reference. The queue raises this flag in case of overflow or underflow. @@ -81,26 +79,64 @@ Using a circular buffer usually entails incrementing indices modulo the buffer s We use a trick to avoid this: we require the FIFO's length to be a power of two, say `2^k`, and we use adders of width `k` to increment the indices. This means we can just naively increment the indices forever and the wrap-around behavior we want is automatically provided by overflow. -## PIFO +## Specialized PIFOs A more complex instance is the priority queue. At time of enqueue, an element is associated with a priority. The most favorably ranked element is the one with the highest priority. -Two elements may be pushed with the same priority. -A priority queue that is additionally defined to break such ties in FIFO order is called a _push in, first out_ (PIFO) queue. +Two elements may be pushed with the same priority; a priority queue that is additionally defined to break such ties in FIFO order is called a _push in, first out_ (PIFO) queue. -Our queue frontend generates a simple PIFO in Calyx; the source code is available [here][pifo.py]. +We provide PIFOs in the general sense (i.e., queues that accept `(item, rank)` pairs and enqueue based on `rank`) [shortly](#minimum-binary-heap). For now, let's focus on specialized PIFOs that have a policy "baked in" to the queue itself. -Curiously, our PIFO has a ranking policy "baked in": it partitions incoming elements into two classes, and tries to emit elements from those two classes in a round-robin fashion. -The PIFO operates in a work-conserving manner: if there are no elements from one class and there are elements from the other, we emit an element from the latter class even if it is not its turn. +We have two types of specialized PIFOs - Round Robin and Strict - that implement policies which determine which flow to pop from next. These PIFOs are parameterized over the number of flows, `n`, that they can arbitrate between. -Internally, our PIFO maintains two sub-queues, one for each class. -It also has a boundary value, which informs its partition policy: elements less than the boundary go to the first class, and other elements go to the second class. -Control logic for pushing a new element is straightforward. -The control logic for peeking and popping is more subtle because this is where the round-robin policy is enforced. -A register tracks the class that we wish to emit from next. -The register starts arbitrarily, and is updated after each successful emission from the desired class. -It is left unchanged in the case when the desired class is empty and an element of the other class is emitted in the interest of work conservation. +### Round-Robin Queues + +Round robin queues are PIFOs generalized to `n` flows that operate in a work +conserving round-robin fashion. That is, if a flow is silent when it is its turn, that flow +simply skips its turn and the next flow is offered service. + +Internally, it operates `n` subqueues. +It takes in a list `boundaries` that must be of length `n`, using which the +client can divide the incoming traffic into `n` flows. +For example, if `n = 3` and the client passes boundaries `[133, 266, 400]`, +packets will be divided into three flows according to the intervals: `[0, 133]`, `[134, 266]`, `[267, 400]`. + +- At `push`, we check the `boundaries` list to determine which flow to push to. +Take the boundaries example given earlier, `[133, 266, 400]`. +If we push the value `89`, it will, under the hood, be pushed into subqueue 0 becuase `0 <= 89 <= 133`, +and `305` will be pushed into subqueue 2 since `266 <= 305 <= 400`. +- The program maintains a `hot` pointer that starts off at 0, meaning the next subqueue to pop from is queue 0. +At `pop` we first try to pop from `hot`. If this succeeds, great. If it fails, +we increment `hot` and therefore continue to check all other flows +in round robin fashion. + +The source code is available in [`gen_strict_or_rr.py`][gen_strict_or_rr.py], which takes as arguments `n`, `boundaries`, and handles to the subqueues it must administer. It also takes a boolean parameter `round_robin`, which, if `true`, results in the generation of a round-robin queue. + + +### Strict Queues + +Strict queues support `n` flows as well, but instead, flows have a strict order of priority and this which determines popping +order. That is, the second-highest priority subqueue will only be allowed to pop if the highest priority subqueue is empty. +If the higher-priority flow get pushed to in the interim, the next call to `pop` will again try to pop from the highest priority flow. + +Like round-robin queues, it takes in a list `boundaries` that must be of length `n`, which divide the incoming traffic into `n` flows. +For example, if `n = 3` and the client passes boundaries `[133, 266, 400]`, +packets will be divided into three flows according to the intervals: `[0, 133]`, `[134, 266]`, `[267, 400]`. + +It takes a list `order` that must be of length `n`, which specifies the order +of priority of the flows. For example, if `n = 3` and the client passes order +`[1, 2, 0]`, then flow 1 (packets in range `[134, 266]`) has first priority, flow 2 +(packets in range `[267, 400]`) has second priority, and flow 0 (packets in range +`[0, 133]`) has last priority. + +- At push, we check the `boundaries` list to determine which flow to push to. +Take the boundaries example given earlier, `[133, 266, 400]`. +If we push the value `89`, it will, under the hood, be pushed into subqueue 0 becuase `0 <= 89 <= 133`, +and `305` will be pushed into subqueue 2 since `266 <= 305 <= 400`. +- Pop first tries to pop from `order[0]`. If this succeeds, great. If it fails, it tries `order[1]`, and so on. + +The source code is available in [`gen_strict_or_rr.py`][gen_strict_or_rr.py], which takes as arguments `n`, `boundaries`, `order`, and handles to the subqueues it must administer. It also takes a boolean parameter `round_robin`, which, if `false`, results in the generation of a strict queue. ## PIFO Tree @@ -112,9 +148,8 @@ A variety of scheduling policies can be realized by manipulating the various pri Popping the most favorably ranked element from a PIFO tree is relatively straightforward: popping the root PIFO tells us which child PIFO to pop from next, and we recurse until we reach a leaf PIFO. We refer interested readers to [this][sivaraman16] research paper for more details on PIFO trees. -Our frontend allows for the creation of PIFO trees of any height, but with two restrictions: -- The tree must be binary-branching. -- The scheduling policy at each internal node must be _round-robin_. +Our frontend allows for the creation of PIFO trees of any height, number of children, and with +the scheduling policy at each internal node being _round-robin_ or _strict_. See the [source code][pifo_tree.py] for an example where we create a PIFO tree of height 2. Specifically, the example implements the PIFO tree described in ยง2 of [this][mohan23] research paper. @@ -123,19 +158,21 @@ Internally, our PIFO tree is implemented by leveraging the PIFO frontend. The PIFO frontend seeks to orchestrate two queues, which in the simple case will just be two FIFOs. However, it is easy to generalize those two queues: instead of being FIFOs, they can be PIFOs or even PIFO trees. +We see a more complex example of a PIFO tree in [`complex_tree.py`] [complex_tree.py]. This tree does round robin between three children, two of which are strict queues and the other is a round robin queue. This tree has a height of 3. The overall structure is `rr(strict(A, B, C), rr(D, E, F), strict(G, H))`. + ## Minimum Binary Heap A minimum binary heap is another tree-shaped data structure where each node has at most two children. However, unlike the queues discussed above, a heap exposes an extended interface: in addition to the input ports and reference registers discussed above, a heap has an additional input `rank`. The `push` operation now accepts both a `value` and the `rank` that the user wishes to associate with that value. -Consequently, a heap _orders_ its elements by `rank`, with the `pop` (resp. `peek`) operation set to remove (resp. read) the element with minimal rank. +Consequently, a heap _orders_ its elements by `rank`, with the `pop` operation set to remove the element with minimal rank. To maintain this ordering efficiently, a heap stores `(rank, value)` pairs in each node and takes special care to maintain the following invariant: > **Min-Heap Property**: for any given node `C`, if `P` is a parent of `C`, then the rank of `P` is less than or equal to the rank of `C`. To `push` or `pop` an element is easy at the top level: write to or read from the correct node, and then massage the tree to restore the Min-Heap Property. -The `peek` operation is constant-time and `push` and `pop` are logarithmic in the size of the heap. +The `push` and `pop` operations are logarithmic in the size of the heap. Our frontend allows for the creation of minimum binary heaps in Calyx; the source code is available in [`binheap.py`][binheap.py]. @@ -148,7 +185,6 @@ Our `stable_binheap` is a heap accepting 32-bit ranks and values. It uses a counter `i` and instantiates, in turn, a binary heap that accepts 64-bit ranks and 32-bit values. - To push a pair `(r, v)` into `stable_binheap`, we craft a new 64-bit rank that incorporates the counter `i` (specifically, we compute `r << 32 + i`), and we push `v` into our underlying binary heap with this new 64-bit rank. We also increment the counter `i`. - To pop `stable_binheap`, we pop the underlying binary heap. -- To peek `stable_binheap`, we peek the underlying binary heap. The source code is available in [`stable_binheap.py`][stable_binheap.py]. @@ -169,3 +205,5 @@ The source code is available in [`stable_binheap.py`][stable_binheap.py]. [gen_queue_data_expect.sh]: https://github.com/calyxir/calyx/blob/main/calyx-py/calyx/gen_queue_data_expect.sh [queue_call.py]: https://github.com/calyxir/calyx/blob/main/calyx-py/calyx/queue_call.py [runt-queues]: https://github.com/calyxir/calyx/blob/a4c2442675d3419be6d2f5cf912aa3f804b3c4ab/runt.toml#L131-L144 +[gen_strict_or_rr.py]: https://github.com/calyxir/calyx/blob/main/calyx-py/test/correctness/queues/strict_and_rr_queues/gen_strict_or_rr.py +[complex_tree.py]: https://github.com/calyxir/calyx/blob/main/calyx-py/test/correctness/queues/complex_tree.py