Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 5536

SDK • Re: pico_w instability with wifi and freertos

$
0
0
Hi All,

Thanks for the help, very useful. I'm continuing to investigate under the assumption that I'm doing something wrong.

Regarding stacks, I tend to agree that's a juicy initial assumption, and I have now investigated this in some depth, along with more areas. I'll share my understanding and investigation notes here. Feel free to correct any misunderstandings.

Stack Usage

I believe there are two types of stacks to care about for my scenario:
* Per-Core (core0/core1) stacks in the SCRATCH_X/SCRATCH_Y sections
* FreeRTOS task stacks

Per-Core Stacks
The per-core stacks (viewtopic.php?t=318030) are defined by the pico-sdk linker script and core0 and core1 non-FreeRTOS execution uses those stacks on startup and typical operation.

Configuration parameters
* PICO_STACK SIZE / PICO_CORE1_STACK_SIZE
* PICO_USE_STACK_GUARDS

If you enable PICO_USE_STACK_GUARDS, memory protection is installed in the last 32 bytes of the memory bank and I see a hard fault occur if you exceed the configured size of the stack.

The actual RP2040 spec'd size of these memory banks is 4k, but for whatever reason are defaulted to 2k in size. Personally I have some huge static initialization as it turns out, and increasing the PICO_STACK_SIZE to 4k caused the core0 mem protection to not trip (whereas it did originally). I am not using core1 for now.

Notably, to determine my startup stack usage, I enabled -fstack-usage to produce .su files on compilation, then wrote a script to go parse them and sort. Some of my bigger static init stack sizes were nearly 2k in size. I have not found a good way yet to determine total stack depth (caller/callee), but the PICO_USE_STACK_GUARDS didn't trip, so I suspect under 4k.

All told, I have a much higher degree of confidence this is not the issue (doesn't exceed core0 stack, and certainly doesn't reach the heap).


FreeRTOS Stacks

FreeRTOS stacks are the per-task stacks, allocated either statically or in the heap.

FreeRTOS allows you to monitor for overages on these stacks (https://www.freertos.org/Stacks-and-sta ... cking.html) by a few methods, but ultimately the kernel will:
* "paint" the task stack with a known value at task start
* look at how far away from the end of the task stack the contiguous values remain (this tells you max stack size)

configCHECK_FOR_STACK_OVERFLOW 2

Because this is a pretty black-and-white assessment, I was able to confirm that the task stacks in my application were not overflowing.


Heap

I am using C++ and stl containers extensively, though don't really call malloc/free myself.

I am using FreeRTOS-Heap3, which uses the native malloc/free (and also moots any FreeRTOS heap sizing params). I am using only core0 at the moment.

From reading, the PICO_HEAP_SIZE parameter is a "minimum" amount of heap to support, but that sbrk will keep increasing to use the total amount of remaining RAM after .data and .bss. I ignore this parameter.

I have enabled:
* PICO_MALLOC_PANIC=1 to panic in the event of memory exhaustion
* PICO_USE_MALLOC_MUTEX=1 to protect each malloc/free with a mutex

I also enabled the FreeRTOS configUSE_MALLOC_FAILED_HOOK 1 to catch any bad mallocs.

Further, I have enabled PICO_DEBUG_MALLOC=1, and modified the pico malloc source code, to print out all mallocs and free addresses. External to the RPi Pico, I monitor the stream of allocations and deallocations to check for total allocation size and any unexpected behavior.

I appear to be using ~14kB of heap at my initial state. Approximately half of the memory (~128kB) of the pico is occupied by statics (💪🏻), leaving approx 128kB remaining, so using 14kB is no concern, and there are no signs that it is problematic either.

That said, I do believe I'm seeing a potentially corrupted free() memory address, which of course could be the root of all problems. This is my next target of investigation.


cppcheck

Notably also I ran up cppcheck against my code base, which conveniently knows about the build compile_commands.json file generated via the sdk. For anyone's benefit, here's the command I ran, which I later viewed via the cppcheckgui.

Nothing stood out, aside from a handful of snprintf format patterns for uint32_t not being the right type for unsigned int (arghh), etc. Will pick through that.

No array bounds issues flagged or undefined behavior.

Code:

cppcheck --force \         --enable=all \         --disable=missingInclude,unusedFunction \         --inconclusive \         --library=posix \         --project="./app/Demo/build/compile_commands.json" \         -j 4 \         --xml \         --output-file=check.xml

Lastly

There are some other areas I will check which may or may not be related to memory corruption, partly to do with using the kernel queue to move blobs between tasks which I want to triple-check I'm sure aren't problematic.

It's very unfortunate there's not a good way to move objects (eg structs with stl vectors in them) as opposed to the raw memory-copy-push-pop operation. Happy to hear if there's a way.

Thanks all for the help.

I did look at some of the references folks posted, btw, and will be digging in more in the coming days.

Statistics: Posted by scipi — Sun Mar 31, 2024 6:50 pm



Viewing all articles
Browse latest Browse all 5536

Trending Articles