Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one more time: initialisation problems #164

Open
DrMarkusKrug opened this issue Jan 15, 2025 · 1 comment
Open

one more time: initialisation problems #164

DrMarkusKrug opened this issue Jan 15, 2025 · 1 comment

Comments

@DrMarkusKrug
Copy link

Dear all,
I was already writing about strange problems with the memory handling in the uros library. I manged a workaround a couple of months ago - but came across the same problem again.
So this should be the first post of a series where I want to solve the problem systematically (if I'm able to). Everybody is welcome to join and contribute to this series.

From my perspective there are at least two severe problems with the uros library:
1.) Wrong memory allocation/free treatment
2.) Manipulating the ISR settings of the application inside the library with no documentation about it.

Some words for the first problem. I'm working with a STM32L4 device that has not much memory resources. I guess some others that work on better equipped STM32 devices might have the same problem - but haven't realized that so far because the comparable large memory is covering the issue. However, I'm deeply concerned that the issue is quite serious (writing into memory spaces that are already used by something else). So it might be worth that even if your STM32 uros application is running to observe the stack usage of the tasks that deal with uros.
Additional to that I realized a strange behaviour when rclc_support_init() fails. It tooks on my device about 12sec until the function call comes back. So there must be a lot of delay() calls somewhere in the uros library - but why?

I want to point to something else that triggers me to be suspicious. In my application (currently not running because of uros initialization problems) I see the following in the memory map:
Screenshot from 2025-01-15 17-15-12

So these custom_xxx variables are consuming a lot of RAM - unusual high. Whatever those data structures are doing - it is not really meaningful for embedded systems. However, that is not explaining the intialization trouble I have because these variables are allocated statically and therefore checked during the build process.
Finally, my libmicroros.a file is about 32MByte. I never saw a library file that is close to that size. Yes, I know that shouldn't be a big thing because the linker is chosing the needed parts only. However, I ask myself why is this library file so unusual big.

Some words for the second problem. I was just trying to use a simple periodic interrupt in the dma_transport functions. This is because I think it might be a chance to work around the memory treatment of the uros library by initializing uros outside of a task. To stay compatible with the uros provided code I add something to the transport functions like this:
image
You can see that I check if the kernel is active or not and use osDelay or a custom delay loop (that is using a periodic interrupt for each millisecond). However, this interrupt is never triggered. If I call this custom delay before doing anything with the uros library it works perfectly fine. So it looks to me that the uros library is manipulating the ISR settings of my application. Very disappointing I have to say.
From this screenshot:
image
you can see the the function calling stack contains no isr at all - nevertheless, one of the listed uros functions is manipulating the ISR settings - otherwise the variable custom_delay_flag will be set to true in the TIM6 ISR (that is never triggered). So I hang in an endless loop. HAL_Delay() isn't a good alternative because it gets in conflict with the osDelay() - but I will work on this tomorrow to have a closer look.

I will continue as soon as I can report something new.

Best Regards
Markus

@pablogs9
Copy link
Member

pablogs9 commented Jan 15, 2025

Hello @DrMarkusKrug,

Some comments regarding your assertions:

Wrong memory allocation/free treatment

micro-ROS is designed as a port of the ROS 2 stack to MCUs, as you might know. This means that above the RMW layer, we use ROS 2 packages. Below the RMW layer (the rmw package itself and the XRCE-DDS middleware), dynamic memory is avoided by default (unless you configure it otherwise). In summary, the memory allocation problem you mention may lie in a ROS 2 package, or it may stem from static memory allocation. As far as I know, there is no known memory error in the micro-ROS stack, hopefully this also apply to this specific packet, unless some STM32-specific code is messing around.

However, I'm deeply concerned that the issue is quite serious (writing into memory spaces that are already used by something else). So it might be worth checking, even if your STM32 micro-ROS application is running, how the stack usage behaves for the tasks dealing with micro-ROS.

Usually, embedded development requires carefully monitoring the stack usage of your spawned tasks. micro-ROS is indeed stack-hungry. It wouldn’t be the first time I’ve seen stack smashing caused by a poorly dimensioned stack size in micro-ROS. Please take a look. Returning to the previous point, if there is “memory smashing” beyond stack overflows, it could be in a ROS 2 package (and would likely be reported on more platforms), or in a set of layers that are statically allocated.

Beyond that, I have personally developed numerous micro-ROS applications on multiple platforms (though honestly, STM32 Cube IDE is not my favorite), ranging from very simple publishers to complex micro-ROS architectures. I haven’t encountered any major memory bugs in the micro-ROS software stack. That doesn’t mean the stack is bug-free, of course, but it does mean that in the years I’ve been working with micro-ROS, all known memory concerns have been addressed as far as we know.

So it might be worth that even if your STM32 uros application is running to observe the stack usage of the tasks that deal with uros.

Beware of the tasks (in plural) that deal with uros, because micro-ROS is a single threaded library, that by default is not thread-safe. So using micro-ROS from taskS can be problematic.

Additionally, I noticed strange behavior when rclc_support_init() fails. It takes about 12 seconds for the function call to return on my device. So there must be a lot of delay() calls somewhere in the micro-ROS library — but why?

This isn’t actually strange behavior. Calling rclc_support_init() triggers the XRCE-DDS Client-to-Agent session initialization, which has a maximum number of attempts and a certain time window for each attempt:
https://github.com/eProsima/Micro-XRCE-DDS-Client/blob/0301e0dc2312a908b732c3d296d74563a385da35/CMakeLists.txt#L57-L58

By default, there are 10 attempts, each with a 1000 ms timeout, which is roughly 10 seconds total. This can be configured in the colcon.meta file. In fact, instructions on how to create applications that “reconnect” to the agent are documented here:
https://docs.vulcanexus.org/en/latest/rst/tutorials/micro/handle_reconnections/handle_reconnections.html

The delay followed by an error code usually means that your micro-ROS client is not reaching the micro-ROS Agent. Which 99% of the time is due to transport not communicating client and agent, and this ends in an error code it rclc_support_init(), which cannot "init the micro-ROS support service (aka middleware, or client-agent connection)".

I want to point to something else that makes me suspicious. In my application (currently not running because of micro-ROS initialization problems), I see the following in the memory map:

Once again, there are static memory pools whose sizes are fully configurable in the colcon.meta file:
https://docs.vulcanexus.org/en/latest/rst/tutorials/micro/memory_management/memory_management.html#entity-creation

So these custom_xxx variables are consuming a lot of RAM — unusually high.

You can reduce them to fit your platform’s constraints.

Finally, my libmicroros.a file is about 32 MB. I’ve never seen a library file that large. Yes, I know it shouldn’t be a big issue because the linker only chooses the necessary parts, but I still wonder why this library file is so unusually big.

We’re aware of this, and it’s intentional. Unless your computer cannot store a 32 MB .a file, this shouldn’t be a practical issue. As you mentioned, link-time optimization will include only the objects and methods that your application actually uses. Unfortunately, we don’t know which among the hundreds of ROS 2 message definitions and core packages a user’s application will need, so we chose to include most of them to cover most use cases in the most straightforward way. We made this decision fully aware that the ~30 MB static library will shrink to tens of kilobytes once the user’s application is linked. More than enough to fit in modern MCU code storages, and that even can be built with [size optimization (-Os) to fit in smaller platforms].(https://docs.vulcanexus.org/en/latest/rst/tutorials/micro/custom_platforms/custom_platforms.html)

Very disappointing, I have to say.

Lastly, regarding your second issue: embedded development is often tricky, obscure, and highly platform-dependent. We on the micro-ROS team try to provide an open and free solution for as many relevant platforms as possible. However, these ports don’t cover every single use case for every single transport and configuration. Most of the time, they require fine-tuning, adaptations, and user-side configurations. Personally, I understand that it can be frustrating, but we cannot provide an out-of-the-box solution for everything, as our team is very very very resource-limited.

So one more time, I hope that some of my comments helps, and please if you find proper solutions, do not hesitate to contribute back with bug fixes or documentation, that will help a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants