Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pubsubclient to v2.11.0 by thingsboard #683

Merged
merged 8 commits into from
Dec 9, 2024

Conversation

uschindler
Copy link
Contributor

A discussed in #682, this PR updates the pubsubclient for MQTT support to 2.10.

Initially the update did not work, because unfortunately the latest version 2.10 introduced a copypaste bug when a password is given for the MQTT client. The password was appended to wrong buffer.

Apropos buffers: The new version uses a separate buffer for receiving values and one for publishing to topics, so the client does not break if you publish new topics from the callback. The big issue with current client (next to the still existing keep-alive) code is that the main receive loop() uses the same buffer than the publishing code, which may lead to problems when you write to the MQTT from the callback - which BSB-LAN is doing (we publish our status from the callback).

I also now understood the code around the keepalive:

  • the keepalive should be larger, the old code set it to 120 seconds. I lowered it to 90s. The keepalive is sent to server on connection (and because of this it has to be initialized early). This was not correct in previous code. The keep alive MUST be set before connecting
  • the socket timeout should be small before connecting, as the value of 120s (that @fredlcore committed recently on my request) causes the connection to fail after 120s when the server is not reachable or the handshake fails. As this part should go fast and the hanging connection should not stop BSB-LAN from progressing, I initialize the socket timeout explicitely to the default value BEFORE connection
  • after connection established, the socket timeout is raised to 120s. This makes sure that the pubsubclient is waiting long enough for broker responses (e.g. the ping). If the MQTT connection fails later, BSB-LAN will block for 120 seconds, but will recover soon after disconnecting and reconnecting. I think this is acceptable.

The keepalive code still has the initialization problem as mentioned before.

For all issues in version 2.10 of the pubsubclient library I will open a PR on their repository tomorrow. Maybe they release 2.11 soon. The password issue is the biggest problem (copypaste error where send_buffer was used instead of receive_buffer) is a serious bug! The keepalive handling works better now.

I marked all lines of code that I patched in the current version with a comment. When you update to 2.11 or later, the lines are hopefully obsolete.

I have this new code now running on my box. Observations:

  • No multiple connections on booting BSB-LAN. Previously the code often reconnected initially, which was an issue with the single buffer for receive and publishing.

I am at moment making the stress test. If all works well tomorrow morning, feel free to commit this.

@@ -225,6 +224,7 @@ bool mqtt_connect() {
MQTTPubSubClient->setServer(mqtt_host, mqtt_port);
printFmtToDebug("Client ID: %s\r\n", mqtt_get_client_id());
printFmtToDebug("Will topic: %s\r\n", mqtt_get_will_topic());
MQTTPubSubClient->setSocketTimeout(MQTT_SOCKET_TIMEOUT); // reset to default
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reset is required if the connection is retried after disconnect!

@fredlcore
Copy link
Owner

Thanks for looking into this.

the keepalive should be larger, the old code set it to 120 seconds. I lowered it to 90s.

This does not make sense to me. If it should be larger (and set to 120s before), why lowering it to 90s?

If the MQTT connection fails later, BSB-LAN will block for 120 seconds, but will recover soon after disconnecting and reconnecting.

This one, I also don't understand. If the connection fails later, a 120s block would still be unacceptable. And why/how would it recover soon after disconnecting/reconnecting?
I still don't see the huge advantage of changing these timeouts. Yes, sometimes the connection gets disconnected so far, but it also reconnects immediately then, but that's more of a "cosmetic" problem. So far, no one has mentioned lost packets etc.

While the use of different buffers is certainly good, I don't think it affects BSB-LAN in the current code. Yes, the status is published from the callback, but at that time, the content of the previous message in the buffer is no longer relevant. Since we figured out that there can't be any kind of race condition, the dual use of the buffer shouldn't be an issue for us.

Bottom line, if the new fork still has serious issues, I'd rather want to wait until they have fixed those upon your PR. The current situation is not critical, and keeping an eye on modified lines in the library code is not something I can afford timewise. So I'll leave this open and once I get a go from your side that all your PRs in that fork have been dealt with, I can merge it here.

@uschindler
Copy link
Contributor Author

Hi,
I fully agree. The serious issue is already worked on. Basically my fix above solves the issue when connecting with password.
For me the deployed version works mostly stable, I have seen one disconnect, but not random ones all the time.

I will respond regarding the timeout values and why I changed it later. It's easy to understand, but as before there's no proof if this fully solves the issues with delays. The idea is to prioritize the importance of: delays on BSB vs. Delays on mqtt connection.

I give more details about the idea, so you can also think about it.

@uschindler
Copy link
Contributor Author

Here is the fix for the "serious" issue, already submitted 3 days ago: thingsboard/pubsubclient#11

@uschindler
Copy link
Contributor Author

Hi,
there are still random disconnects (but much more seldom).

I added some small code changes to improve debbugging output to figure out why the connection was lost, see 8fe8a2d. The pubsubclient records the failure reason in its connection state. The state is exposed with the state() function. I modified the log messages to add the value of the state, so you can look it up in the defines. It tells you if the connection timed out, or if something else went wrong on the connection (like invalid packets received or connection was disconnected by broker).

In addition, I moved the parsing of hostname/port a few lines down (see 76b156b), because at moment it is always executed in the main loop on every call to mqtt_connect(). But this is doing too much, as the hostname/port pair is only needed when actually connecting or reconnecting, so allocating memory to copy around strings and doing strtok() is a waste of resources.

I will now record the serial console for a full overview including connection states.

P.S.: I am just documenting here what I am doing because I want to have my recherche publicly available!

@uschindler
Copy link
Contributor Author

I opened a PR with the improvements collected from several issues regarding keepalive problems with several brokers. It looks like mosquitto has some changes regarding protecting itsself from too many pings. Therefore the code may get into the state to wait for a ping reply but never gets one.

I am testing the additional changes at moment which are part of the PR thingsboard/pubsubclient#14 (mine) and thingsboard/pubsubclient#11

@uschindler
Copy link
Contributor Author

Hi, they released new version: https://github.com/thingsboard/pubsubclient/releases/tag/v2.11.0

@fredlcore
Copy link
Owner

Great, just let me know once all the relevant problems are fixed and the update is not just an in-between-versions solution, then I can integrate it into BSB-LAN.

@uschindler uschindler changed the title Update pubsubclient to v2.10 by thingsboard Update pubsubclient to v2.11.0 by thingsboard Dec 9, 2024
@uschindler
Copy link
Contributor Author

uschindler commented Dec 9, 2024

Hi,
I updated the PR here. The code ran for 3 days now, disconnects (especially those happing after bootup) dramatically reduced.

There are still some disconnects, but those were more related to the ESP32 socket API failing for some reason and not because of keepalive handling. At least my mosquitto no longer tells me "disconnected due to keep alive timeout".

The PR here provides:

  • it updates pubsubclient library
  • it fixes some keep alive problems in the code, which happen when you set the logging time > keep-alive time and also fixes several security issues with not correctly checked length on packets and it separated send and receive buffers
  • the keep alive time is unmodified to original code (with 120 seconds), but the original code had a problem with correctly setting that up: it raised the keepalive time AFTER the first connection, which is incorrect, because the client communicates with broker in the protocol setup. So the original code reported 15 seconds on connect (the default) and later (after connection) changed the value to 120 s. This caused disconnects because it sent pings only after 120 seconds if no wire activity was there. Actually this was not a serious issue, because if the logging delay is 30 seconds, the ping is not needed. But with the default of 300s is used, it disconnected. It also disconnected during startup, because of maintenance tasks. In addition the problem resolved by itsself, because when the client reconnected after the disconnect by server, the correct value of 120 seconds was already set on the cpubsubclient, so it was communicated correctly on connection to broker. If you need more information about the why, please ask. It's hard to explain, because the issue only happens once when booting the BSB-LAN code.

So it is up to you if you want to merge now:

  • it updates the library to newest version as the old one is no longer maintained
  • it fixes a bug with communicating keep alive in the original mqtt_handle.h code. It must be set BEFORE connecting, not after. This is a real bug and leads to dicsonnects
  • some random disconnects are still not solved, but as they happen only one high heater activity and also cause disconnects of a telnet connection, it seems to be an issue with netoreking stack of ESP32 API.

On the longer term, we should drop Arduio Duo support and switch to ESP32 MQTT API only. ESP32 has an MQTT client included in its toolkit which works async and is implemented in the core of the ESP32 library, which also supports latest MQTT v5: https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-reference/protocols/mqtt.html

By the way: You opened an issue on the original bugtracker about the pubsubclient to being able to connect to an PicoMQTT running on the ESP32 localhost. Actually this is not a fault of the client, it's ESP32's network API that can't handle this. You can't even connect to the local webserver from inside the ESP32 code. "localhost" is not supported, as the networking API requires packages always sent over the wire. In difference to Linux, theres no local fully implemented TCP/IP stack. A socket always goes over network card.

},
"version": "2.8",
"version": "2.10.0",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for some reason they forgot to update the version number in the JSON file of release.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they already forgot this on last release 2.10.1. I don't know for what the JSON file is used.

@uschindler
Copy link
Contributor Author

Do you prefer to squash all changes to one commit? Github can do this automatically, but some projects to prefer that the PR creator does it.

@fredlcore
Copy link
Owner

For me, the question is whether all these changes in the code are also now part of the actual library. I don't want to have to check the code if in a few months when another update is on the table. Basically, I just want to be able to pull from their repo and add it to mine. If that might be a step back because not all of the above is part of the library's main repo, I rather keep their library as it is, rather than fixing bugs now and then inadvertedly reintroducing them at a later stage again.

@uschindler
Copy link
Contributor Author

The files in the src/ folder are all 1:1 copies of the downloaded release artifact. Only the "test" folder was excluded because it didn't exist before.

The only local modifications by myself are in BSB-LAN's mqtt_handler.h.

Uwe

@fredlcore
Copy link
Owner

And on what grounds do you think we should drop Arduino Due support? There are still a lot of users out there who use it, and I'm not at all a fan of just throwing away "old" technology just because it's Black Friday. Furthermore, MQTT might be part of the espressif API, but I'm not using the espressif API but the Arduino API, obviously. Since the examples are all based on the espressif API, they are not applicable to this project, unless I've overlooked this being implemented in the Arduino API.

@fredlcore
Copy link
Owner

Great, so that means everything is ready for merging now? Then I'll proceed asap...

@uschindler
Copy link
Contributor Author

And on what grounds do you think we should drop Arduino Due support? There are still a lot of users out there who use it, and I'm not at all a fan of just throwing away "old" technology just because it's Black Friday. Furthermore, MQTT might be part of the espressif API, but I'm not using the espressif API but the Arduino API, obviously. Since the examples are all based on the espressif API, they are not applicable to this project, unless I've overlooked this being implemented in the Arduino API.

Let's keep that discussion out of this issue.

There is an Arduino library for MQTT available that wraps the expressif API. The comment I did was a discussion point about moving a project forward. It has nothing to do with fixing this issue. In 4.x you also dropped support for older hardware, so on long term planning for 5.x I would put all options together and decide what's healthy to keep maintenance under control and reduce number of compilation targets.

I made a suggestion based on observations, it's your turn to think about it.

@uschindler
Copy link
Contributor Author

Great, so that means everything is ready for merging now? Then I'll proceed asap...

Yes!

You can check it after merging on your own by copying the source code of the pubsubclient release and check that git diff returns no changes.

@fredlcore fredlcore merged commit ebb5559 into fredlcore:master Dec 9, 2024
@uschindler uschindler deleted the dev/update_2_10 branch December 9, 2024 15:10
@fredlcore
Copy link
Owner

What library is it that wraps the espressif MQTT library/API? I would then check if it can be used as a drop-in replacement for PubSubClient for the ESP32 part of the code. But I'm not sure if async MQTT would actually be an advantage if the other components are not. After all, the BSB/LPB bus requires serialized access, so the advantages of asynchronous access are limited, IMHO.
The last time we dropped development for older hardware was for the Arduino Mega, and that was in 2020, and only due to the fact that the code size exceeded the Mega's capacity. Nevertheless, we left the Mega-code intact, so that users could apply bugfixes by themselves even afterwards. The Mega-related code was only removed a year or two later, IIRC.
I won't rule out that there is a sunset for the Due as well at some point, but that is more related to the fact that the Arduino guys themselves have phased out the Due. So once critical libraries will no longer be developed and/or won't work without significant extra efforts, that will be the time, but I don't see that coming anytime soon.

@fredlcore
Copy link
Owner

By the way, the new code does not compile for the Due infrastructure:

In file included from /Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/char_traits.h:39:0,
                 from /Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/string:40,
                 from /Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/stdexcept:39,
                 from /Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/array:39,
                 from /Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/tuple:39,
                 from /Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/functional:54,
                 from src/BSB_LAN/src/PubSubClient/src/PubSubClient.h:81,
                 from /Users/frederik/Documents/PlatformIO/Projects/BSB-LAN/src/BSB_LAN/BSB_LAN.ino:234:
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:243:56: error: macro "min" passed 3 arguments, but takes just 2
     min(const _Tp& __a, const _Tp& __b, _Compare __comp)
                                                        ^
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:265:56: error: macro "max" passed 3 arguments, but takes just 2
     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
                                                        ^
In file included from /Users/frederik/.platformio/packages/framework-arduino-sam/system/libsam/chip.h:66:0,
                 from /Users/frederik/.platformio/packages/framework-arduino-sam/cores/arduino/Arduino.h:42,
                 from /var/folders/jf/x456vzg54wx05y247tqvmln00000gn/T/tmp_2r_7vvk:1:
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:195:5: error: expected unqualified-id before 'const'
     min(const _Tp& __a, const _Tp& __b)
     ^
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:195:5: error: expected ')' before 'const'
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:195:5: error: expected ')' before 'const'
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:195:5: error: expected ')' before 'const'
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:195:5: error: expected initializer before 'const'
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:219:5: error: expected unqualified-id before 'const'
     max(const _Tp& __a, const _Tp& __b)
     ^
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:219:5: error: expected ')' before 'const'
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:219:5: error: expected ')' before 'const'
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:219:5: error: expected ')' before 'const'
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:219:5: error: expected initializer before 'const'
In file included from /Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/char_traits.h:39:0,
                 from /Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/string:40,
                 from /Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/stdexcept:39,
                 from /Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/array:39,
                 from /Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/tuple:39,
                 from /Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/functional:54,
                 from src/BSB_LAN/src/PubSubClient/src/PubSubClient.h:81,
                 from /Users/frederik/Documents/PlatformIO/Projects/BSB-LAN/src/BSB_LAN/BSB_LAN.ino:234:
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:246:7: error: expected primary-expression before 'if'
       if (__comp(__b, __a))
       ^~
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:246:7: error: expected '}' before 'if'
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:246:7: error: expected ';' before 'if'
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:248:7: error: expected unqualified-id before 'return'
       return __a;
       ^~~~~~
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:268:7: error: expected primary-expression before 'if'
       if (__comp(__a, __b))
       ^~
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:268:7: error: expected '}' before 'if'
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:268:7: error: expected ';' before 'if'
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:270:7: error: expected unqualified-id before 'return'
       return __a;
       ^~~~~~
/Users/frederik/.platformio/packages/toolchain-gccarmnoneeabi/arm-none-eabi/include/c++/7.2.1/bits/stl_algobase.h:271:5: error: expected declaration before '}' token
     }
     ^
*** [.pio/build/due-PPS/src/BSB_LAN.ino.cpp.o] Error 1

@uschindler
Copy link
Contributor Author

I haven't compiled that on Due. The issue is some change in the PubSubClient.h file that was part of this change:
thingsboard/pubsubclient@e49f361

The ifdefs looked a bit strange to me. Not sure how to handle this. You can either revert this PR, or we can work around it.

@fredlcore
Copy link
Owner

Found the "solution":
https://community.platformio.org/t/project-build-fails-after-latest-updates/8731/2
and added that to PubSubClient.h. Meaning any future update will have to add this :(.
What worries me more now is this remaining warning:

Compiling .pio/build/due-PPS/src/src/PubSubClient/src/PubSubClient.cpp.o
In file included from /Users/frederik/Documents/PlatformIO/Projects/BSB-LAN/src/BSB_LAN/BSB_LAN.ino:1646:0:
src/BSB_LAN/include/mqtt_handler.h: In function 'void mqtt_disconnect()':
src/BSB_LAN/include/mqtt_handler.h:275:12: warning: deleting object of polymorphic class type 'PubSubClient' which has non-virtual destructor might cause undefined behavior [-Wdelete-non-virtual-dtor]
     delete MQTTPubSubClient;
            ^~~~~~~~~~~~~~~~

Based on suggestions on stackexchange, changing PubSubClient.h to this:

//   ~PubSubClient();
   virtual ~PubSubClient();        // needed to remove warning for Arduino Due

removes the warning, but I know too little about polymorphism in C++ to be sure that this is not just hiding the issue. Do you have any suggestion here?

@uschindler
Copy link
Contributor Author

Are you sure that this is new? See the old issue in the original knolleary pubsubclient:
knolleary/pubsubclient#625

It has to do with inheritance and some incompatibility with extending Print class. Interestingly this was the same in old code, so the version 2.8 as used before should have produced same warning!

@uschindler
Copy link
Contributor Author

The alternative solution is to remove the extension based on the Print class. As said in above issue, it is no longer needed in the code, but was kept for backwards compatibility.

@uschindler
Copy link
Contributor Author

change line 93 to:

class PubSubClient {

This should remove the warning and makes the PubSubClient no longer pointlessly extend from Print.

@uschindler
Copy link
Contributor Author

We should maybe open an issue. Extending from Print makes no sense.

@fredlcore
Copy link
Owner

Interestingly this was the same in old code, so the version 2.8 as used before should have produced same warning!

Could be. Once it is compiled and the code isn't changed, the warnings don't come up again. I saw this one now because compilation broke in the GitHub workflow and then this warning also came up.
What would be safer? This approach:

//   ~PubSubClient();
   virtual ~PubSubClient();        // needed to remove warning for Arduino Due

or this one?

class PubSubClient {

@fredlcore
Copy link
Owner

BTW, the latter approach does not remove the warning...

@uschindler
Copy link
Contributor Author

uschindler commented Dec 9, 2024

Sorry for the issue, unfortunately the github workflow was not running on original PR !

If you want to fix the issue with the desctructor is to add a virtual destructor. I wonder why removing the multiple inheritance does not help.

@fredlcore
Copy link
Owner

fredlcore commented Dec 9, 2024

Add a virtual constructor or destructor?

@uschindler
Copy link
Contributor Author

Add a virtual constructor or destructor?

sorry was a typo, corrected my response.

@uschindler
Copy link
Contributor Author

Thanks for fixing the issues. I compiled your master branch and deployed it, all looks fine.

@fredlcore
Copy link
Owner

Thanks for your efforts as well!

@uschindler
Copy link
Contributor Author

uschindler commented Dec 11, 2024

Hello @fredlcore ,
I wanted to give some quick feedback on the random (and very seldom) disconnects that are still occurring. It really seems to be the heater. I had a serial console connected, but of course nothing happened until I removed it yesterday evening. MQTT reconnected again this morning and then also a minute ago (unfortunately without a console). But what I was able to observe: When I looked at the heater display, shortly after the reconnect (I was standing right next to it), the hourglass was flashing. According to the ISR-Plus instructions, this means "When the hourglass appears on the display, the control unit is sending data to the controller. Depending on the amount of data transferred, this process can take several seconds. No further settings can be made during this time. Wait until the hourglass symbol goes out."

In short, it is as expected: Communication on the bus is probably stuck because either the central unit or something else is not responding. The hourglass was flashing for almost 2 minutes, while the heater was still working normally.

I'll try to connect the serial console again, but it was the same in the TELNET console. It seems that the ESP32 network firmware simply resets itself when communication on the Ethernet breaks down because packets are no longer processed. This then leads to the MQTT disconnect (which, according to Mosquito, was initiated by the BSB-LAN).

In short: Let's leave it at that for now, the disconnects are triggered by the heating, and the single-threaded ESP32 then crashes the network. The idea with the MQTT loop wasn't actually bad, but it caused other problems.

Only idea (please don't kill me for the stupid question): Does a sleep() also include a yield()? I've often read code that calls the yield() function in "waiting/spin loops"? The BSB-BUS wait is a sleep/delay and not a yield.

P.S.: The central unit LMU74 is only 2 years old, so hopefully it doesn't have any broken capacitors yet.

@uschindler
Copy link
Contributor Author

Only idea (please don't kill me for the stupid question): Does a sleep() also include a yield()? I've often read code that calls the yield() function in "waiting/spin loops"? The BSB-BUS wait is a sleep/delay and not a yield.

Ignoriere das, delay() ruft yield() eh auf.

@fredlcore
Copy link
Owner

The problem you are describing shouldn't be affecting BSB-LAN. The hourglass for a longer duration usually occurs only after power-up. In that process, the display unit and the heater are doing the same process that BSB-LAN mimics with the device-specific parameter list, i.e. exchanging information about the available parameters. However, this shouldn't affect BSB-LAN's performance. If the bus is busy, then the requests to query parameters are timing out. There are three retries, and each retry is given 3 seconds, so after 10 seconds, BSB-LAN should operate normal again until the next parameter is polled. So with the timeouts you have now set before, that shouldn't be an issue, IMHO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants