alloc: virtual regions and vpage allocator.#10281
alloc: virtual regions and vpage allocator.#10281lgirdwood wants to merge 2 commits intothesofproject:mainfrom
Conversation
|
@jsarha fyi - this will be needed by your updates. |
zephyr/lib/pacovr.c
Outdated
| return; | ||
|
|
||
| /* simple free, just increment free count, this is for tuning only */ | ||
| p->batch_free_count++; |
There was a problem hiding this comment.
should this be decremented during allocator's alloc() call?
There was a problem hiding this comment.
no, intention here is we should be able to see if this is > 0 then there is room for heap optimization.
zephyr/lib/vpages.c
Outdated
|
|
||
| /* map the virtual blocks in virtual region to free physical blocks */ | ||
| ret = sys_mm_drv_map_region_safe(page_context.virtual_region, vaddr, | ||
| 0, pages * CONFIG_MM_DRV_PAGE_SIZE, SYS_MM_MEM_PERM_RW); |
There was a problem hiding this comment.
this is allocating physical pages from anywhere in L2 memory?
There was a problem hiding this comment.
Free physical pages from L2 SRAM
There was a problem hiding this comment.
Free physical pages from L2 SRAM
@lgirdwood right, but shouldn't this be allocating from a pre-allocated set of pages for this "heap" or am I misunderstanding the concept?
zephyr/lib/pacovr.c
Outdated
| /* allocate memory */ | ||
| ptr = p->batch_ptr; | ||
| p->batch_ptr += size; | ||
| p->batch_used += size; |
There was a problem hiding this comment.
are "batch" and "scratch" allocator names commonly used? I found a couple of references to scratch allocators, e.g. https://stackoverflow.com/questions/74956010/what-are-the-differences-between-block-stack-and-scratch-allocators#75044083 but there it's defined as "once allocated, never freed," similar to this your batch allocator?
There was a problem hiding this comment.
and I've also found scratch memory as "temporary", but lets rename based on use. i.e. static and dynamic
| switch (container->type) { | ||
| case MOD_RES_HEAP: | ||
| #if CONFIG_SOF_PACOVR | ||
| /* do we need to use the scratch heap or the batch heap? */ |
There was a problem hiding this comment.
I do not think this would work. Its not guaranteed that memory allocated in initialization phase is freed in also in initialization phase. For the moment there are modules freeing memory at initialized phase, that was allocated during init pahse.
It should not be too hard to write a single pacovr_free() that knows from the pointer's memory range if it was allocated from scratch or batch (=> dynamic or static) heap.
04bc5c1 to
987b31e
Compare
987b31e to
2146f88
Compare
|
@jsarha updated, can you try this with your topology patch for pipeline memory. Thanks ! |
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a new PACOVR (Pre Allocated COntiguous Virtual memory Region) memory allocator system for pipeline resource management. The system provides virtual page allocation with both dynamic heap and static linear allocators built on top.
- Adds virtual page allocator for contiguous memory regions
- Implements PACOVR allocator with dynamic heap and static allocation capabilities
- Integrates PACOVR into module allocation functions and pipeline lifecycle
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| zephyr/lib/vpages.c | Virtual page allocator implementation with mapping/unmapping functionality |
| zephyr/lib/pacovr.c | PACOVR allocator with dynamic heap and static linear allocation |
| zephyr/include/sof/lib/vpage.h | API header for virtual page allocator |
| zephyr/include/sof/lib/pacovr.h | API header for PACOVR allocator |
| zephyr/Kconfig | Configuration options for PACOVR and virtual page elements |
| zephyr/CMakeLists.txt | Build system integration for new virtual memory files |
| src/include/sof/audio/pipeline.h | Pipeline structure extension to include PACOVR instance |
| src/audio/pipeline/pipeline-graph.c | Pipeline creation/destruction integration with PACOVR |
| src/audio/module_adapter/module/generic.c | Module allocation functions updated to use PACOVR |
| scripts/xtensa-build-zephyr.py | Additional symlink creation for firmware deployment |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
zephyr/lib/vpages.c
Outdated
| void vpage_free(void *ptr) | ||
| { | ||
| k_mutex_lock(&page_context.lock, K_FOREVER); | ||
| vpages_free_and_unmap((uintptr_t *)ptr); |
There was a problem hiding this comment.
The variable 'ret' is not in scope here. The return value from vpages_free_and_unmap() should be captured and checked.
| vpages_free_and_unmap((uintptr_t *)ptr); | |
| int ret = vpages_free_and_unmap((uintptr_t *)ptr); |
| * @brief Free virtual pages | ||
| * Frees previously allocated virtual memory pages and unmaps them. | ||
| * | ||
| * @param ptr |
There was a problem hiding this comment.
Missing documentation for the ptr parameter in the function comment block.
| * @param ptr | |
| * @param ptr Pointer to the start of the virtual memory pages to be freed. | |
| * Must be page-aligned and previously allocated by the virtual page allocator. |
zephyr/include/sof/lib/pacovr.h
Outdated
| * @param[in] batch_size Size of the PACOVR batch region. | ||
| * @param[in] scratch_size Size of the scratch heap. |
There was a problem hiding this comment.
The parameter names in the documentation (batch_size, scratch_size) don't match the actual function parameter names (static_size, dynamic_size) in the implementation.
zephyr/lib/pacovr.c
Outdated
| size_t total_size; | ||
|
|
||
| if (!static_size || !dynamic_size) { | ||
| LOG_ERR("error: invalid pacovr static size %d or dynamic size %d", |
There was a problem hiding this comment.
Using %d format specifier for size_t variables. Should use %zu for size_t to avoid potential format string issues.
| LOG_ERR("error: invalid pacovr static size %d or dynamic size %d", | |
| LOG_ERR("error: invalid pacovr static size %zu or dynamic size %zu", |
zephyr/lib/pacovr.c
Outdated
|
|
||
| /* check we have enough static space left */ | ||
| if (p->static_used + size > p->static_size) { | ||
| LOG_ERR("error: pacovr static alloc failed for %d bytes, only %d bytes free", |
There was a problem hiding this comment.
Using %d format specifier for size_t variables. Should use %zu for size_t to avoid potential format string issues.
| LOG_ERR("error: pacovr static alloc failed for %d bytes, only %d bytes free", | |
| LOG_ERR("error: pacovr static alloc failed for %zu bytes, only %zu bytes free", |
zephyr/include/sof/lib/vpage.h
Outdated
| * | ||
| * @return Pointer to the allocated virtual memory region, or NULL on failure. | ||
| */ | ||
| void *vpage_alloc(uint32_t pages); |
| * @param ptr Pointer to store the address of allocated pages. | ||
| * @retval 0 if successful. | ||
| */ | ||
| static int vpages_alloc_and_map(uint32_t pages, void **ptr) |
There was a problem hiding this comment.
The function returns an error code as a negative value.
There was a problem hiding this comment.
The function returns an error code as a negative value.
I meant for pages
| err, pages, page_context.total_pages, page_context.free_pages); | ||
| } | ||
| LOG_INF("vpage_alloc ptr %p pages %d free %d/%d", ptr, pages, page_context.free_pages, | ||
| page_context.total_pages); |
There was a problem hiding this comment.
...and it should really be a LOG_DBG()
There was a problem hiding this comment.
fixed the condition but keeping INF atm since this is new and we need the data.
2146f88 to
9a9c238
Compare
9a9c238 to
dd0601e
Compare
softwarecki
left a comment
There was a problem hiding this comment.
Alright, here’s my round of nitpicking:
- Looks like the ghost of memory zone is back, now wearing a fancy new name: region type. Quick question - will cacheability secretly depend on this type? Or is that just a plot twist waiting to happen?
- Consider moving page mapping logic into vregion. For lifetime allocators, mapping pages progressively as needed could help reduce memory usage.
- What is the purpose of the
textregion type? Whether vregion it's for a pipeline or a module, module executable code can be shared across multiple pipelines, so this seems wrong place. - Should memory management code be integrated into Zephyr? My understanding was that this was the intended direction for VMH.
- Function naming feels inconsistent:
*_vpagesvsvregion_*. Worth aligning these.
I submitted a change request because this design looks like it's meant to replace the virtual heap, but it's missing a critical feature for deep buffering: predefined buffer sizes. The buffer size provided via ipc is only a hint. The firmware tries to allocate the largest possible buffer, and if memory is tight, it falls back to a smaller one. If the requested size exceeds available memory, allocation should still create the largest feasible buffer. Without predefined buffers it will lead to saturate memory and block further allocations. With predefined buffer sizes, we can pick the largest available option while leaving others for future allocations.
| * @param ptr Pointer to store the address of allocated pages. | ||
| * @retval 0 if successful. | ||
| */ | ||
| static int vpages_alloc_and_map(uint32_t pages, void **ptr) |
There was a problem hiding this comment.
The function returns an error code as a negative value.
| /* store the size and virtual page number in first free alloc element, | ||
| * we have already checked for a free element before the mapping. | ||
| */ | ||
| for (int i = 0; i < VPAGE_MAX_ALLOCS; i++) { |
There was a problem hiding this comment.
Use page_context.num_elems as array index.
There was a problem hiding this comment.
I've renamed the variable name, as this the number of elems in use rather than total number of elems to search.
There was a problem hiding this comment.
maybe rather use lists to avoid scanning the array each time. Can be a follow-up optimisation
zephyr/lib/vpages.c
Outdated
| * Allocates virtual memory pages from the virtual page allocator. | ||
| * | ||
| * @param pages Number of 4kB pages to allocate. | ||
| * @retval ptr to allocated pages if successful. |
There was a problem hiding this comment.
There are not the ptr param.
zephyr/lib/vpages.c
Outdated
| int ret; | ||
|
|
||
| /* check for valid ptr which must be page aligned */ | ||
| if (!ptr || ((uintptr_t)ptr % CONFIG_MM_DRV_PAGE_SIZE) != 0) { |
There was a problem hiding this comment.
IS_ALIGNED(ptr, CONFIG_MM_DRV_PAGE_SIZE)
- optional
|| !page_context.num_elems.
| LOG_DBG("found allocation element %d pages %d vpage %d for ptr %p", | ||
| i, page_context.velems[i].pages, | ||
| page_context.velems[i].vpage, ptr); | ||
|
|
There was a problem hiding this comment.
To eliminate the loop during page allocation, I suggest replace the removed element with the last item in the array. When allocating, the new element is always added at the end of the array.
page_context.num_elems--;
page_context.velems[i] = page_context.velems[page_context.num_elems];
page_context.velems[page_context.num_elems].pages = 0;
page_context.velems[page_context.num_elems].vpage = 0;There was a problem hiding this comment.
See the comment above about renaming the variable name to make meaning clearer..
There was a problem hiding this comment.
When allocating, the new element is always added at the end of the array.
ehm, and if it's already taken? Switching to a "free" and a "used" list would eliminate the allocation loop though.
There was a problem hiding this comment.
its a always a small haystack due to the regions being big and the memory size being small, this is also very simple. I doubt we will have more than 16 live regions.
There was a problem hiding this comment.
ehm, and if it's already taken?
@lyakh If it is taken, it means that the array is full.
It seems my suggestion was misunderstood. The idea is to append elements sequentially at indexes 0, 1, 2, 3, etc. When removing the entry at index 1, you move the last valid element into that position. For example, if the last element is at index 3, after removal 1 the array becomes: 1, 3, 2.
We do not need to preserve the order of elements. Since page_context.num_elems - 1 always points to the last valid element, there is no need to search for it.
zephyr/lib/vregion.c
Outdated
| /* calculate new heap object size for object and alignments */ | ||
| heap_obj_size = aligned_ptr - heap->ptr + size; | ||
|
|
||
| /* check we have enough shared static space left */ |
zephyr/lib/vregion.c
Outdated
| * @param[in] vr Pointer to the virtual region instance. | ||
| * @param[in] ptr Pointer to the memory to free. | ||
| */ | ||
| void lifetime_free(struct vregion *vr, struct vlinear_heap *heap, void *ptr) |
There was a problem hiding this comment.
Missing static? lifetime_alloc is static.
Unused vr param.
zephyr/lib/vregion.c
Outdated
| return interim_alloc(vr, &vr->interim_shared, size, alignment); | ||
| case VREGION_MEM_TYPE_LIFETIME_SHARED: | ||
| return lifetime_alloc(vr, &vr->lifetime_shared, size, | ||
| alignment < CONFIG_DCACHE_LINE_SIZE ? CONFIG_DCACHE_LINE_SIZE : alignment, |
There was a problem hiding this comment.
Move this logic to the lifetime_alloc function.
There was a problem hiding this comment.
This avoids the need to pass a shared flag today.
There was a problem hiding this comment.
MAX(alignment, CONFIG_DCACHE_LINE_SIZE)
| } | ||
|
|
||
| /* allocate module in correct vregion*/ | ||
| //TODO: add coherent flag for cross core DP modules |
There was a problem hiding this comment.
Consider adding the ZERO flag as well to replace rzalloc and avoid introducing additional memset calls.
| return; | ||
| } | ||
|
|
||
| LOG_ERR("error: vregion free invalid pointer %p", ptr); |
No ghosts here, the old zone was tied to very limited memory types and capabilities from ancient HW e.g. to make sure we didn't squander things like DMAable memory etc. A region is partitioned into different areas tied to usage in our pipeline lifecycle. There is not type like DMA, LP, HP RAM etc its all just a virtual page. A region is partitioned this way as its easier to then assign attributes for sharing to different cores/domains and to also limit execution (all where permitted by HW).
What would moving the remapping logic gain us ? Keeping the page allocator in a separate file/API keeps things simpler and exposes a page allocator API that can be used as needed independently of regions. Agree, growing the lifetime allocators would save some pages if the runtime size < topology size, but this would lead to fragmentation. i.e. we have to track several virtual page (start, size) per pipeline. This could be something that can come later if we want to add more tracking, but keeping it simple is key at first.
So its an optional usage partition with main use being TEXT for 3P DP modules. This keeps domain tracking easier and these pages will get RO EXEC permission for that domain.
Not atm. We need to get this working for audio as first step. If the vpages part is useful and generic, yes then it can go to Zephyr. The vregion part is probably too specific for our use case.
These are 2 separate APIs hence the difference in naming.
This is factored in to the vregion size allocation for KPB and Deepbuffer like use cases today. i.e. SW has to include in the total memory budget. Its not perfect but I dont want to change too much here until pipeline 2.0 is completed (and at that point we can factor in the pipeline 2.0 changes around the buffers). |
dd0601e to
b35816b
Compare
zephyr/Kconfig
Outdated
| This setting defines the maximum number of virtual memory allocation | ||
| elements that can be tracked. Each allocation element represents a | ||
| contiguous block of virtual memory allocated from the virtual memory | ||
| heap. Increasing this number allows for more simultaneous allocations, |
There was a problem hiding this comment.
"virtual memory heap" is what is currently called VMH, and this infrastructure isn't related to it, so maybe better try to find a different description
| * @brief Allocate virtual pages | ||
| * Allocates a specified number of contiguous virtual memory pages. | ||
| * | ||
| * @param[in] pages Number of 4kB pages to allocate. |
There was a problem hiding this comment.
should we really hard-specify the size?
There was a problem hiding this comment.
This is a runtime decision based on topology so yes the size must be aligned.
There was a problem hiding this comment.
@lgirdwood sorry, I meant "4kB." I suppose we aren't getting any other page sizes ever, but still maybe write something like "Number of memory pages (currently 4kB large)..." or similar
zephyr/include/sof/lib/vpages.h
Outdated
|
|
||
| /** | ||
| * @brief Allocate virtual pages | ||
| * Allocates a specified number of contiguous virtual memory pages. |
There was a problem hiding this comment.
at least to me this isn't "sufficiently" clear: I think what is meant here is allocating N physical pages and mapping them to sequential virtual page addresses, not just reserving those addresses to later be mapped to physical pages?
There was a problem hiding this comment.
added more context to comment.
| * This allocator manages the allocation and deallocation of virtual memory pages from | ||
| * a predefined virtual memory region which is larger than the physical memory region. | ||
| * | ||
| * Both memory regions are divided into 4kB pages that are represented as blocks in a |
There was a problem hiding this comment.
maybe better not hard-code page size - throughout the code. CONFIG_MM_DRV_PAGE_SIZE
There was a problem hiding this comment.
This is an ABI since count will come from topology and also manifest (if used). The macro is used in the code, but the ABI today is 4kB pages (and we can updated comments and ABI in future if page size ever changes).
There was a problem hiding this comment.
an ABI between the host and the DSP? It's build-time selectable since it's architecture-specific, I wouldn't even call it an ABI. The interface between the driver and the firmware OTOH is an ABI, but there it isn't fixed, is it? We never pass page counts there?
There was a problem hiding this comment.
It's in the manifest/IPC4. I'm going to stick with this today so they are all aligned.
| */ | ||
| struct valloc_elem { | ||
| uint16_t pages; /* number of 4kB pages allocated in contiguous block */ | ||
| uint16_t vpage; /* virtual page number from start of region */ |
There was a problem hiding this comment.
I wouldn't use fixed-size types, here could be unsigned short, below unsigned int
There was a problem hiding this comment.
style - its helpful here when mentally mapping bits for size.
| mod->input_buffers = | ||
| #if CONFIG_SOF_VREGIONS | ||
| vregion_alloc(vregion, VREGION_MEM_TYPE_LIFETIME_SHARED, | ||
| sizeof(*mod->input_buffers) * mod->max_sources); |
There was a problem hiding this comment.
the allocation wasn't COHERENT, why SHARED? Also in other locations
| } else { | ||
| mod->output_buffers = NULL; | ||
| } | ||
| memset(mod->output_buffers, 0, sizeof(*mod->output_buffers) * mod->max_sinks); |
There was a problem hiding this comment.
this will panic if else above is taken
src/schedule/zephyr_dp_schedule.c
Outdated
| /* free task stack */ | ||
| #if CONFIG_SOF_VREGIONS | ||
| struct vregion *vregion = module_get_vregion(pdata->mod); | ||
| vregion_free(vregion, (__sparse_force void *)pdata->p_stack); |
There was a problem hiding this comment.
user-space stack has to be created using Zephyr API
| IPC_COMP_IGNORE_REMOTE); | ||
| if (ipc_pipe) { | ||
| pipeline = ipc_pipe->pipeline; | ||
| } |
There was a problem hiding this comment.
can modules be instantiated without a pipeline?
| /* set up ipc4 configuration items if needed from topology */ | ||
| if (ipc_pipe) { | ||
| dev->pipeline = ipc_pipe->pipeline; | ||
| dev->pipeline = pipeline; |
There was a problem hiding this comment.
looks like yes, there might be no pipeline
969bcfb to
3849b37
Compare
1bfa302 to
5ea392c
Compare
Add a simple virtual page allocator that uses the Zephyr memory mapping infrastructure to allocate pages from a virtual region and map them to physical pages. Due to simplicity, the number of allocations is limited to CONFIG_SOF_VPAGE_MAX_ALLOCS Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Add support for per pipeline and per module virtual memory regions. The intention is to provide a single virtual memory region per pipeline or per DP module that can simplify module/pipeline memory management. The region takes advantage of the way pipeline and modules are constructed, destroyed and used during their lifetimes. 1) memory tracking - 1 pointer/size per pipeline or DP module. 2) memory read/write/execute permissions 3) memory sharing across cores and domains. 4) cache utilization. Modules and pipelines will allocate from their region only and this will be abstracted via the allocation APIs. Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
ec50b14 to
9e93e12
Compare
|
Comments addressed, removed all patches except for vpage allocator and vregion allocator, other patches will followup in subsequent PRs. |
| * @brief Allocate virtual pages | ||
| * Allocates a specified number of contiguous virtual memory pages. | ||
| * | ||
| * @param[in] pages Number of 4kB pages to allocate. |
There was a problem hiding this comment.
@lgirdwood sorry, I meant "4kB." I suppose we aren't getting any other page sizes ever, but still maybe write something like "Number of memory pages (currently 4kB large)..." or similar
| * This allocator manages the allocation and deallocation of virtual memory pages from | ||
| * a predefined virtual memory region which is larger than the physical memory region. | ||
| * | ||
| * Both memory regions are divided into 4kB pages that are represented as blocks in a |
There was a problem hiding this comment.
an ABI between the host and the DSP? It's build-time selectable since it's architecture-specific, I wouldn't even call it an ABI. The interface between the driver and the firmware OTOH is an ABI, but there it isn't fixed, is it? We never pass page counts there?
| int ret; | ||
|
|
||
| /* check for valid ptr which must be page aligned */ | ||
| if (IS_ALIGNED(ptr, CONFIG_MM_DRV_PAGE_SIZE) == 0) { |
There was a problem hiding this comment.
I think IS_ALIGNED() has boolean semantics, so maybe just !IS_ALIGNED(...) instead of comparing to 0
| LOG_DBG("found allocation element %d pages %d vpage %d for ptr %p", | ||
| i, page_context.velems[i].pages, | ||
| page_context.velems[i].vpage, ptr); | ||
|
|
There was a problem hiding this comment.
When allocating, the new element is always added at the end of the array.
ehm, and if it's already taken? Switching to a "free" and a "used" list would eliminate the allocation loop though.
| /* should never fail */ | ||
| assert(!err); | ||
|
|
||
| LOG_INF("vptr %p free/total pages %d/%d", ptr, page_context.free_pages, |
There was a problem hiding this comment.
INF today as its low frequency and needs to mature alongside kernel patches.
| interim_size = ALIGN_UP(interim_size, CONFIG_MM_DRV_PAGE_SIZE); | ||
| lifetime_shared_size = ALIGN_UP(lifetime_shared_size, CONFIG_MM_DRV_PAGE_SIZE); | ||
| interim_shared_size = ALIGN_UP(interim_shared_size, CONFIG_MM_DRV_PAGE_SIZE); | ||
| text_size = ALIGN_UP(text_size, CONFIG_MM_DRV_PAGE_SIZE); |
There was a problem hiding this comment.
not sure what we need "text" here for. This is a generic allocator for modules / pipelines, it cannot be used to hold LLEXT .text, .data, .roodata because those need to be mapped to pre-defined addresses
There was a problem hiding this comment.
This needs fixed. llext modules must be PIC.
There was a problem hiding this comment.
@lgirdwood why? this saves us some runtime linking time, and possibly some execution time by eliminating GOT / PLT use
| * | ||
| * 5. Shared Lifetime Allocator (optional): A simple incrementing allocator used for long | ||
| * term static allocations that can be shared between multiple cores or domains. This | ||
| * allocator aligns allocations to cache line boundaries to ensure cache coherency. |
There was a problem hiding this comment.
not sure I see a use-case for either of the shared regions. Although - maybe if you have a pipeline where a component is running in DP mode on a different core? So buffers between it and the adjacent components are only needed for the lifetime of the pipeline, but they have to be shared between cores... Maybe mention this use-case here to make it clearer?
There was a problem hiding this comment.
Yes, cross core DP or cross domain DP.
| vr->interim.base = vr->text.base + text_size; | ||
| vr->lifetime.base = vr->interim.base + interim_size; | ||
| vr->lifetime_shared.base = vr->lifetime.base + lifetime_size; | ||
| vr->interim_shared.base = vr->lifetime_shared.base + lifetime_shared_size; |
There was a problem hiding this comment.
should "shared" be uncached and others cached?
There was a problem hiding this comment.
nothing uncached. cross core synchronisation will need a new client helper API (or reuse coherent.h with some mods) to allocate coherent objects that can be atomically acquired/released.
There was a problem hiding this comment.
Just needs care if this is bolted to the sof/lib/alloc.h interface. Users of this interface assume they can get both cached and uncached allocs and we've done quite a bit of work to remove adhoc cache/uncached conversions from the user-side. The allocator should support this. For application code, we should mostly used cached, while for any allocs of Zephyr kernel objects (or structs containing them) we should use uncached.
|
|
||
| /* also align up size to D$ bytes if asked - allocation head and tail aligned */ | ||
| if (align == CONFIG_DCACHE_LINE_SIZE) | ||
| size = ALIGN_UP(size, CONFIG_DCACHE_LINE_SIZE); |
There was a problem hiding this comment.
this special-casing is a bit strange. Can you think of any examples where you care about start address alignment but you don't care about size alignment? If you need to align to some value, usually you want to "own" an integer number of those alignment units, don't you?
There was a problem hiding this comment.
So this is try to target use cases where we have some cache aware code or we want to avoid clobbering the trailing bytes. i.e. allocate 32 bytes and pad 32 bytes on 64 byte cache line. Whilst also being able to allocate a collection of objects with SIMD vector types (not array) on minimal cache lines.
In a nutshell, client is asking for cache align so make sure we pad the tail otherwise support compressing to minimize cache line usage.
There was a problem hiding this comment.
@lgirdwood right, but do you have any use-cases where the caller needs to align the beginning but not the size?
| return interim_alloc(&vr->interim_shared, size, alignment); | ||
| case VREGION_MEM_TYPE_LIFETIME_SHARED: | ||
| return lifetime_alloc(&vr->lifetime_shared, size, | ||
| MAX(alignment, CONFIG_DCACHE_LINE_SIZE)); |
There was a problem hiding this comment.
why are only life-time shared allocations forced to cache-line alignment?
kv2019i
left a comment
There was a problem hiding this comment.
Looks good. A few minor comments, but nothing to add to reviews already done. I didn't catch any blockers. Biggest concern is @lgirdwood your comment on only providing cached addresses. This can create some issues, but depends on how this is integrated.
PS A separate ztest case would be nice.
| config SOF_VREGIONS | ||
| bool "Enable virtual memory regions" | ||
| default y if ACE | ||
| default n |
There was a problem hiding this comment.
@lgirdwood The "default n" is superfluous, can be dropped (n is the default and not having it is better so a default can be set in defconfig files).
| vr->interim.base = vr->text.base + text_size; | ||
| vr->lifetime.base = vr->interim.base + interim_size; | ||
| vr->lifetime_shared.base = vr->lifetime.base + lifetime_size; | ||
| vr->interim_shared.base = vr->lifetime_shared.base + lifetime_shared_size; |
There was a problem hiding this comment.
Just needs care if this is bolted to the sof/lib/alloc.h interface. Users of this interface assume they can get both cached and uncached allocs and we've done quite a bit of work to remove adhoc cache/uncached conversions from the user-side. The allocator should support this. For application code, we should mostly used cached, while for any allocs of Zephyr kernel objects (or structs containing them) we should use uncached.
softwarecki
left a comment
There was a problem hiding this comment.
No ghosts here, the old zone was tied to very limited memory types and capabilities from ancient HW e.g. to make sure we didn't squander things like DMAable memory etc.
A region is partitioned into different areas tied to usage in our pipeline lifecycle. There is not type like DMA, LP, HP RAM etc its all just a virtual page
I understand that those zones we removed some time ago were caused by hardware limitations. But now we are starting to split things again like in the old sof: this region has lifetime allocations, this one temporary, this one lifetime but without cache, this one temporary without cache. What a memory fragmentation! Maybe it wouldn't even be that painful if it were only on virtual memory mapped on demand.
Quick question - will cacheability secretly depend on this type? Or is that just a plot twist waiting to happen?
A region is partitioned this way as its easier to then assign attributes for sharing to different cores
So under region type we will hide whether the memory is cacheable.
Agree, growing the lifetime allocators would save some pages if the runtime size < topology size, but this would lead to fragmentation. i.e. we have to track several virtual page (start, size) per pipeline. This could be something that can come later if we want to add more tracking, but keeping it simple is key at first.
What I meant was not that virtual allocations should grow together with the lifetime allocator. The entire virtual range is reserved upfront. We only map physical pages when the used size increases. Fragmentation of physical pages is irrelevant here. It is likely fragmented already and it hasn't any impact.
We also do not need any additional bookkeeping. The vlinear_heap structure already tracks how much memory is in use. As this value grows and crosses a page boundary, we simply can map the next pages. There is no freeing in this allocator, so we do not need to handle unmapping.
I see room for another allocator that could benefit from the fact that this memory is not mapped upfront. I mean an opportunistic allocator for the kpb. We have a continuous virtual range reserved for the kpb, and we map as many pages as we can to satisfy the kpb request. A next step could be reclaiming kpb memory when the memory is needed somewhere else. Its a harder part of adventure.
| for (int i = 0; i < VPAGE_MAX_ALLOCS; i++) { | ||
| if (page_context.velems[i].pages == 0) { | ||
| page_context.velems[i].pages = pages; | ||
| page_context.velems[i].vpage = | ||
| (POINTER_TO_UINT(vaddr) - | ||
| POINTER_TO_UINT(page_context.vpage_blocks.buffer)) / | ||
| CONFIG_MM_DRV_PAGE_SIZE; | ||
| page_context.num_elems_in_use++; | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
To eliminate the loop during page allocation, I suggest:
page_context.velems[page_context.num_elems_in_use].pages = pages;
page_context.velems[page_context.num_elems_in_use].vpage =
(POINTER_TO_UINT(vaddr) -
POINTER_TO_UINT(page_context.vpage_blocks.buffer)) /
CONFIG_MM_DRV_PAGE_SIZE;
page_context.num_elems_in_use++;| LOG_DBG("found allocation element %d pages %d vpage %d for ptr %p", | ||
| i, page_context.velems[i].pages, | ||
| page_context.velems[i].vpage, ptr); | ||
|
|
There was a problem hiding this comment.
ehm, and if it's already taken?
@lyakh If it is taken, it means that the array is full.
It seems my suggestion was misunderstood. The idea is to append elements sequentially at indexes 0, 1, 2, 3, etc. When removing the entry at index 1, you move the last valid element into that position. For example, if the last element is at index 3, after removal 1 the array becomes: 1, 3, 2.
We do not need to preserve the order of elements. Since page_context.num_elems - 1 always points to the last valid element, there is no need to search for it.
|
|
||
| /* uncache persistent across all cores */ | ||
| static struct vpage_context page_context; | ||
| static sys_bitarray_t bitmap; |
There was a problem hiding this comment.
Can we move bitmap to vpage_context?
| static sys_bitarray_t bitmap; | ||
|
|
||
| /* singleton across all cores */ | ||
| static int vpage_init_done; |
There was a problem hiding this comment.
It seems that SYS_INIT is called only once, and there is no risk of additional calls from other cores. Shouldn't this be placed in non-cacheable memory if it is meant to be shared across cores?
EDIT - split into smaller pages and vregion only patches.
WIP: To be split into smaller patches.
Add a simple contiguous virtual page allocator.
Add a virtual region per pipeline/DP module.
Integrate into mod_alloc() and mod_free().
TODO: Make sure all pipeline resources exist within same virtual
region.