-
-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pubsubclient to v2.11.0 by thingsboard #683
Conversation
…ail; improve keep-alive handling
BSB_LAN/include/mqtt_handler.h
Outdated
@@ -225,6 +224,7 @@ bool mqtt_connect() { | |||
MQTTPubSubClient->setServer(mqtt_host, mqtt_port); | |||
printFmtToDebug("Client ID: %s\r\n", mqtt_get_client_id()); | |||
printFmtToDebug("Will topic: %s\r\n", mqtt_get_will_topic()); | |||
MQTTPubSubClient->setSocketTimeout(MQTT_SOCKET_TIMEOUT); // reset to default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reset is required if the connection is retried after disconnect!
Thanks for looking into this.
This does not make sense to me. If it should be larger (and set to 120s before), why lowering it to 90s?
This one, I also don't understand. If the connection fails later, a 120s block would still be unacceptable. And why/how would it recover soon after disconnecting/reconnecting? While the use of different buffers is certainly good, I don't think it affects BSB-LAN in the current code. Yes, the status is published from the callback, but at that time, the content of the previous message in the buffer is no longer relevant. Since we figured out that there can't be any kind of race condition, the dual use of the buffer shouldn't be an issue for us. Bottom line, if the new fork still has serious issues, I'd rather want to wait until they have fixed those upon your PR. The current situation is not critical, and keeping an eye on modified lines in the library code is not something I can afford timewise. So I'll leave this open and once I get a go from your side that all your PRs in that fork have been dealt with, I can merge it here. |
Hi, I will respond regarding the timeout values and why I changed it later. It's easy to understand, but as before there's no proof if this fully solves the issues with delays. The idea is to prioritize the importance of: delays on BSB vs. Delays on mqtt connection. I give more details about the idea, so you can also think about it. |
Here is the fix for the "serious" issue, already submitted 3 days ago: thingsboard/pubsubclient#11 |
…not called when connection already exists
Hi, I added some small code changes to improve debbugging output to figure out why the connection was lost, see 8fe8a2d. The pubsubclient records the failure reason in its connection state. The state is exposed with the In addition, I moved the parsing of hostname/port a few lines down (see 76b156b), because at moment it is always executed in the main loop on every call to I will now record the serial console for a full overview including connection states. P.S.: I am just documenting here what I am doing because I want to have my recherche publicly available! |
I opened a PR with the improvements collected from several issues regarding keepalive problems with several brokers. It looks like mosquitto has some changes regarding protecting itsself from too many pings. Therefore the code may get into the state to wait for a ping reply but never gets one. I am testing the additional changes at moment which are part of the PR thingsboard/pubsubclient#14 (mine) and thingsboard/pubsubclient#11 |
Hi, they released new version: https://github.com/thingsboard/pubsubclient/releases/tag/v2.11.0 |
Great, just let me know once all the relevant problems are fixed and the update is not just an in-between-versions solution, then I can integrate it into BSB-LAN. |
Hi, There are still some disconnects, but those were more related to the ESP32 socket API failing for some reason and not because of keepalive handling. At least my mosquitto no longer tells me "disconnected due to keep alive timeout". The PR here provides:
So it is up to you if you want to merge now:
On the longer term, we should drop Arduio Duo support and switch to ESP32 MQTT API only. ESP32 has an MQTT client included in its toolkit which works async and is implemented in the core of the ESP32 library, which also supports latest MQTT v5: https://docs.espressif.com/projects/esp-idf/en/stable/esp32/api-reference/protocols/mqtt.html By the way: You opened an issue on the original bugtracker about the pubsubclient to being able to connect to an PicoMQTT running on the ESP32 localhost. Actually this is not a fault of the client, it's ESP32's network API that can't handle this. You can't even connect to the local webserver from inside the ESP32 code. "localhost" is not supported, as the networking API requires packages always sent over the wire. In difference to Linux, theres no local fully implemented TCP/IP stack. A socket always goes over network card. |
}, | ||
"version": "2.8", | ||
"version": "2.10.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some reason they forgot to update the version number in the JSON file of release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they already forgot this on last release 2.10.1. I don't know for what the JSON file is used.
Do you prefer to squash all changes to one commit? Github can do this automatically, but some projects to prefer that the PR creator does it. |
For me, the question is whether all these changes in the code are also now part of the actual library. I don't want to have to check the code if in a few months when another update is on the table. Basically, I just want to be able to pull from their repo and add it to mine. If that might be a step back because not all of the above is part of the library's main repo, I rather keep their library as it is, rather than fixing bugs now and then inadvertedly reintroducing them at a later stage again. |
The files in the src/ folder are all 1:1 copies of the downloaded release artifact. Only the "test" folder was excluded because it didn't exist before. The only local modifications by myself are in BSB-LAN's Uwe |
And on what grounds do you think we should drop Arduino Due support? There are still a lot of users out there who use it, and I'm not at all a fan of just throwing away "old" technology just because it's Black Friday. Furthermore, MQTT might be part of the espressif API, but I'm not using the espressif API but the Arduino API, obviously. Since the examples are all based on the espressif API, they are not applicable to this project, unless I've overlooked this being implemented in the Arduino API. |
Great, so that means everything is ready for merging now? Then I'll proceed asap... |
Let's keep that discussion out of this issue. There is an Arduino library for MQTT available that wraps the expressif API. The comment I did was a discussion point about moving a project forward. It has nothing to do with fixing this issue. In 4.x you also dropped support for older hardware, so on long term planning for 5.x I would put all options together and decide what's healthy to keep maintenance under control and reduce number of compilation targets. I made a suggestion based on observations, it's your turn to think about it. |
Yes! You can check it after merging on your own by copying the source code of the pubsubclient release and check that |
What library is it that wraps the espressif MQTT library/API? I would then check if it can be used as a drop-in replacement for PubSubClient for the ESP32 part of the code. But I'm not sure if async MQTT would actually be an advantage if the other components are not. After all, the BSB/LPB bus requires serialized access, so the advantages of asynchronous access are limited, IMHO. |
By the way, the new code does not compile for the Due infrastructure:
|
I haven't compiled that on Due. The issue is some change in the PubSubClient.h file that was part of this change: The ifdefs looked a bit strange to me. Not sure how to handle this. You can either revert this PR, or we can work around it. |
Found the "solution":
Based on suggestions on stackexchange, changing PubSubClient.h to this:
removes the warning, but I know too little about polymorphism in C++ to be sure that this is not just hiding the issue. Do you have any suggestion here? |
Are you sure that this is new? See the old issue in the original knolleary pubsubclient: It has to do with inheritance and some incompatibility with extending Print class. Interestingly this was the same in old code, so the version 2.8 as used before should have produced same warning! |
The alternative solution is to remove the extension based on the |
change line 93 to: class PubSubClient { This should remove the warning and makes the PubSubClient no longer pointlessly extend from |
We should maybe open an issue. Extending from |
Could be. Once it is compiled and the code isn't changed, the warnings don't come up again. I saw this one now because compilation broke in the GitHub workflow and then this warning also came up.
or this one?
|
BTW, the latter approach does not remove the warning... |
Sorry for the issue, unfortunately the github workflow was not running on original PR ! If you want to fix the issue with the desctructor is to add a virtual destructor. I wonder why removing the multiple inheritance does not help. |
Add a virtual constructor or destructor? |
sorry was a typo, corrected my response. |
Thanks for fixing the issues. I compiled your master branch and deployed it, all looks fine. |
Thanks for your efforts as well! |
Hello @fredlcore , In short, it is as expected: Communication on the bus is probably stuck because either the central unit or something else is not responding. The hourglass was flashing for almost 2 minutes, while the heater was still working normally. I'll try to connect the serial console again, but it was the same in the TELNET console. It seems that the ESP32 network firmware simply resets itself when communication on the Ethernet breaks down because packets are no longer processed. This then leads to the MQTT disconnect (which, according to Mosquito, was initiated by the BSB-LAN). In short: Let's leave it at that for now, the disconnects are triggered by the heating, and the single-threaded ESP32 then crashes the network. The idea with the MQTT loop wasn't actually bad, but it caused other problems. Only idea (please don't kill me for the stupid question): Does a P.S.: The central unit LMU74 is only 2 years old, so hopefully it doesn't have any broken capacitors yet. |
Ignoriere das, |
The problem you are describing shouldn't be affecting BSB-LAN. The hourglass for a longer duration usually occurs only after power-up. In that process, the display unit and the heater are doing the same process that BSB-LAN mimics with the device-specific parameter list, i.e. exchanging information about the available parameters. However, this shouldn't affect BSB-LAN's performance. If the bus is busy, then the requests to query parameters are timing out. There are three retries, and each retry is given 3 seconds, so after 10 seconds, BSB-LAN should operate normal again until the next parameter is polled. So with the timeouts you have now set before, that shouldn't be an issue, IMHO. |
A discussed in #682, this PR updates the pubsubclient for MQTT support to 2.10.
Initially the update did not work, because unfortunately the latest version 2.10 introduced a copypaste bug when a password is given for the MQTT client. The password was appended to wrong buffer.
Apropos buffers: The new version uses a separate buffer for receiving values and one for publishing to topics, so the client does not break if you publish new topics from the callback. The big issue with current client (next to the still existing keep-alive) code is that the main receive loop() uses the same buffer than the publishing code, which may lead to problems when you write to the MQTT from the callback - which BSB-LAN is doing (we publish our status from the callback).
I also now understood the code around the keepalive:
The keepalive code still has the initialization problem as mentioned before.
For all issues in version 2.10 of the pubsubclient library I will open a PR on their repository tomorrow. Maybe they release 2.11 soon. The password issue is the biggest problem (copypaste error where send_buffer was used instead of receive_buffer) is a serious bug! The keepalive handling works better now.
I marked all lines of code that I patched in the current version with a comment. When you update to 2.11 or later, the lines are hopefully obsolete.
I have this new code now running on my box. Observations:
I am at moment making the stress test. If all works well tomorrow morning, feel free to commit this.