.. comment:: SPDX-License-Identifier: MIT comment:: Copyright (C) 2025 Advanced Micro Devices, Inc ############## Memory Model ############## The AMD Alveo V80 board has two distinct memory subsystems — DDR and HBM — each with different capacity, bandwidth, and access characteristics. This document explains how SLASH models these subsystems and how the runtime allocator manages device memory. DDR Memory ========== The V80 has a single DDR address space, accessed through the QDMA subsystem (PCIe Physical Function 1). DDR offers large capacity and is suitable for bulk data storage where bandwidth is not the primary concern. In VRT, DDR memory is selected with ``MemoryRangeType::DDR``: .. code-block:: cpp vrt::Buffer buffer(device, size, vrt::MemoryRangeType::DDR); In the linker configuration, DDR is referenced as ``DDR0``: .. code-block:: ini sp=offset_0.m_axi_gmem0:DDR0 HBM (High Bandwidth Memory) ============================ The V80 includes HBM organized as 64 pseudo-channels (HBM0–HBM63). Each channel provides independent bandwidth, and the aggregate bandwidth across all channels is substantially higher than DDR. There are two access modes: Port-Based Access ----------------- ``MemoryRangeType::HBM`` with an explicit port number allocates on a specific HBM channel. The kernel port must be mapped to the same channel via the ``sp=`` directive in the linker configuration. .. code-block:: cpp // Allocate on HBM channel 1 vrt::Buffer buffer(device, size, vrt::MemoryRangeType::HBM, 1); .. code-block:: ini # Linker config must match sp=increment_0.m_axi_gmem0:HBM1 Internally, the port number maps to an ``HBMRegion`` enum value (``HBM0`` through ``HBM63``) and the allocation type is set to ``BufferAllocType::Hbm``. .. note:: Constructing a buffer with ``MemoryRangeType::HBM`` but *without* a port throws ``std::invalid_argument``. HBM always requires an explicit channel unless you use VNOC. VNOC (Virtual NoC) Access -------------------------- ``MemoryRangeType::HBM_VNOC`` allocates across multiple HBM channels using the on-chip Virtual Network-on-Chip, aggregating bandwidth without requiring the application to manage individual channels. .. code-block:: cpp vrt::Buffer buffer(device, size, vrt::MemoryRangeType::HBM_VNOC); The allocation type is set to ``BufferAllocType::HbmVnoc`` and no specific ``HBMRegion`` is selected. MemoryConfig and Port Mapping ============================== Rather than specifying memory types and ports manually, the recommended approach is to use ``MemoryConfig`` — a struct that carries both the ``MemoryRangeType`` and an optional HBM port number: .. code-block:: cpp struct MemoryConfig { MemoryRangeType type; std::optional hbmPort; }; Obtain a ``MemoryConfig`` from the kernel: .. code-block:: cpp // By port name vrt::MemoryConfig config = kernel.portMemoryConfig("m_axi_gmem0"); // By argument name vrt::MemoryConfig config = kernel.argMemoryConfig("in"); These methods parse the ``system_map.xml`` inside the vrtbin to determine which memory type and channel the kernel port is connected to. The returned config can be passed directly to the ``Buffer`` constructor: .. code-block:: cpp vrt::Buffer buffer(device, size, kernel.argMemoryConfig("in")); This ensures the buffer allocation always matches the linker configuration. Buddy Allocator =============== On hardware, VRT uses a three-tier buddy-system allocator to manage device memory efficiently. Each tier handles a different size range: **SmallBlock** (4 KB – 2 MB) Managed by ``BuddySuperblockBase<12, 21>``. Allocations are carved from a 2 MB superblock using power-of-two splitting. **MediumBlock** (2 MB – 64 MB) Managed by ``BuddySuperblockBase<21, 26>``. Allocations are carved from a 64 MB superblock. **LargeBlock** (> 64 MB) Allocated directly from vrtd as a standalone DMA buffer, bypassing the buddy system. When a buffer is allocated: 1. The size is rounded up to the nearest power of two. 2. The allocator searches for the smallest available block that fits. 3. If the available block is larger than needed, it is split in half repeatedly until the target size is reached. The unused halves are returned to the free list. When a buffer is freed: 1. The allocator checks if the freed block's *buddy* (the other half from the original split) is also free. 2. If so, the two halves are coalesced back into a single larger block. 3. This continues up the hierarchy until no more buddies can be merged. This approach minimises fragmentation while keeping allocation and deallocation fast. Platform Differences ==================== The memory model is designed to be transparent across all three SLASH platforms, but the underlying mechanisms differ: **Hardware** Real DMA allocations through the vrtd daemon, libslash, and the kernel driver. ``sync()`` triggers QDMA transfers between host and device memory. The buddy allocator manages physical address space. **Emulation** Fake physical addresses are assigned starting at ``0x4000000000`` (HBM) and ``0x60000000000`` (DDR). Buffer data is exchanged with the C-model via ZeroMQ IPC. No real DMA occurs. **Simulation** Same fake address scheme as emulation. Buffer data is exchanged with the Verilog simulation via ZeroMQ. The address windows match the simulation memory map configured in the linker's ``run_pre.tcl``. In all cases, the ``Buffer`` API (construction, ``sync()``, ``operator[]``) is identical. Application code does not need to change when switching platforms.