Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(reset): Automatically reconnect if port disconnects during reset. (ESPTOOL-862) #980

Closed
wants to merge 1 commit into from

Conversation

andrewleech
Copy link
Contributor

@andrewleech andrewleech commented May 24, 2024

When RTS/DTS reset is used on S3, the device / usb controller may reset as well causing the port to disconnect.
This change detects the exception thrown when this happens (only during reset) and automatically tries to reconnect the port to continue the operatrion.

This change fixes the following bug(s):

First seen during implementation of micropython/micropython#15108
When OTG is used on S3, we need to switch back to SerialJtag controller before rebooting to bootloader after detecting RTS/DTR reset pattern on the OTG port.
Currently, esptool throws a BusError or OSError during this like

  File "/home/anl/.espressif/python_env/idf5.2_py3.8_env/lib/python3.8/site-packages/esptool/reset.py", line 108, in __call__
    self._setDTRandRTS(False, False)  # IO0=HIGH, done
  File "/home/anl/.espressif/python_env/idf5.2_py3.8_env/lib/python3.8/site-packages/esptool/reset.py", line 35, in wrapper
    ret = f(*args)
  File "/home/anl/.espressif/python_env/idf5.2_py3.8_env/lib/python3.8/site-packages/esptool/reset.py", line 67, in _setDTRandRTS
    "I", fcntl.ioctl(self.port.fileno(), TIOCMGET, struct.pack("I", 0))
OSError: [Errno 5] Input/output error

I have tested this change with the following hardware & software combinations:

I've tested this only on a LilyGO T-Display-S3 so far. I don't have access to any other S3 (or S2) devices currently.

I have run the esptool.py automated integration tests with this change and the above hardware:

NO TESTING

Copy link

github-actions bot commented May 24, 2024

Messages
📖 🎉 Good Job! All checks are passing!

👋 Hello andrewleech, we appreciate your contribution to this project!


📘 Please review the project's Contributions Guide for key guidelines on code, documentation, testing, and more.

Click to see more instructions ...


This automated output is generated by the PR linter DangerJS, which checks if your Pull Request meets the project's requirements and helps you fix potential issues.

DangerJS is triggered with each push event to a Pull Request and modify the contents of this comment.

Please consider the following:
- Danger mainly focuses on the PR structure and formatting and can't understand the meaning behind your code or changes.
- Danger is not a substitute for human code reviews; it's still important to request a code review from your colleagues.
- To manually retry these Danger checks, please navigate to the Actions tab and re-run last Danger workflow.

Review and merge process you can expect ...


We do welcome contributions in the form of bug reports, feature requests and pull requests via this public GitHub repository.

This GitHub project is public mirror of our internal git repository

1. An internal issue has been created for the PR, we assign it to the relevant engineer.
2. They review the PR and either approve it or ask you for changes or clarifications.
3. Once the GitHub PR is approved, we synchronize it into our internal git repository.
4. In the internal git repository we do the final review, collect approvals from core owners and make sure all the automated tests are passing.
- At this point we may do some adjustments to the proposed change, or extend it by adding tests or documentation.
5. If the change is approved and passes the tests it is merged into the default branch.
5. On next sync from the internal git repository merged change will appear in this public GitHub repository.

Generated by 🚫 dangerJS against 970e037

@andrewleech andrewleech force-pushed the reconnect_in_reset branch from e3e0ecb to c5d7c15 Compare May 24, 2024 03:44
@github-actions github-actions bot changed the title feat(reset): Automatically reconnect if port disconnects during reset. feat(reset): Automatically reconnect if port disconnects during reset. (ESPTOOL-862) May 24, 2024
@andrewleech andrewleech force-pushed the reconnect_in_reset branch from c5d7c15 to 970e037 Compare May 24, 2024 09:48
@radimkarnis
Copy link
Collaborator

Hello @andrewleech,
thank you for your contribution!

The idea of reopening the port when it disappears seems good to me.
I am not sure it should be implemented on the level of individual DTR / RTS transitions, though:

  • If the port disappears in the middle of the reset sequence, the rest of it should not be attempted. With this implementation, the port is reopened and the sequence just continues - which is wasteful.
  • The issue is only happening in the USB-Serial/JTAG and USB-OTG modes. When flashing with a classic USB-to-UART bridge (ClassicReset and UnixTightReset reset strategies), this may not be applicable.

For this reason, I would suggest implementing the wrapper on the ResetStrategy level, what do you think?

Other notes:

  • 20 retries are probably too many. I have tried running the full test suite on our CI with this change and when the chip is not responding (for other reasons than the one you've described) the tests take significantly longer than they used to and eventually time out.
  • We would like to bring this improvement into esptool, but this needs to be tested more. Do you have a reproducible way to try out the OSError: [Errno 5] Input/output error error?

Thank you!

@andrewleech
Copy link
Contributor Author

andrewleech commented Jun 18, 2024

Thanks for the feedback @radimkarnis sorry I didn't get back to you, got pulled away.
I do have more devices on the way that I'll test against but yes I agree it probably makes much more sense to implement this as a ResetStrategy for just the S3 and maybe S2.

I'll think more about how to reproduce the issue I'm trying to solve here - it will only occur on devices where the USB stack / connection gets reset when the chip gets reset. So certainly external usb-serial or internal usb/cdc/jtag stack is not affected.

I'll look into writing a ResetStrategy for this, once I have that done I'll re-open this ticket with the new detail.

@radimkarnis
Copy link
Collaborator

Thank you @andrewleech!

We have merged your solution and expanded it to whole reset strategies. If you can, please try to confirm this still does the trick for you!

@andrewleech
Copy link
Contributor Author

That sounds great thanks! I'll certainly endeavour to test master in the next day or two.

@andrewleech
Copy link
Contributor Author

I've just tested master with a S3 and it's not quite performing the same - it looks like the timeout is enough for my use case.

anl@STEP: ~/micropython/ports/esp32/build-ESP32_GENERIC_S3 $ ls /dev/ttyACM1 
/dev/ttyACM1

anl@STEP: ~/micropython/ports/esp32/build-ESP32_GENERIC_S3 $ lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 007: ID 303a:4001 Espressif System Espressif Device
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

anl@STEP: ~/micropython/ports/esp32/build-ESP32_GENERIC_S3 $ /home/anl/.local/pipx/venvs/esptool/bin/esptool.py --chip esp32s3 -p /dev/ttyACM1 -b 460800 --after=hard_reset write_flash --flash_mode dio --flash_freq 80m
 --flash_size 8MB 0x0 bootloader/bootloader.bin 0x10000 micropython.bin 0x8000 partition_table/partition-table.bin
esptool.py v4.7.0
Serial port /dev/ttyACM1
Connecting...

A serial exception error occurred: [Errno 19] could not open port /dev/ttyACM1: [Errno 19] No such device: '/dev/ttyACM1'
Note: This error originates from pySerial. It is likely not a problem with esptool, but with the hardware connection or drivers.
For troubleshooting steps visit: https://docs.espressif.com/projects/esptool/en/latest/troubleshooting.html
(idf5.2_py3.8_env) anl@STEP: ~/micropython/ports/esp32/build-ESP32_GENERIC_S3 $ ls /dev/ttyACM1 
/dev/ttyACM1

anl@STEP: ~/micropython/ports/esp32/build-ESP32_GENERIC_S3 $ lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 008: ID 303a:1001 Espressif USB JTAG/serial debug unit
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

anl@STEP: ~/micropython/ports/esp32/build-ESP32_GENERIC_S3 $ /home/anl/.local/pipx/venvs/esptool/bin/esptool.py --chip esp32s3 -p /dev/ttyACM1 -b 460800 --after=hard_reset write_flash --flash_mode dio --flash_freq 80m --flash_size 8MB 0x0 bootloader/bootloader.bin 0x10000 micropython.bin 0x8000 partition_table/partition-table.bin
esptool.py v4.7.0
Serial port /dev/ttyACM1
Connecting...
Chip is ESP32-S3 (QFN56) (revision v0.1)
Features: WiFi, BLE, Embedded Flash 4MB (XMC), Embedded PSRAM 2MB (AP_3v3)
Crystal is 40MHz
MAC: f0:f5:bd:6d:8a:20
Uploading stub...
Running stub...
Stub running...
Changing baud rate to 460800
Changed.
...

In above log I have a S3 device running micropython updated to handle the RTS/DTR reset

At the start of the log above it's connected at /dev/ttyACM1 as seen in ls / lsusb
I then try to flash with esptool.py installed from current master. It starts to connect then fails with No such device: '/dev/ttyACM1' I then check with ls / lsusb and it does exist and is now connected in bootloader mode. Running the exact same esptool command again works now and flashes just fine.

After this, I switched back to my branch of esptool and the install works first time. The longer timeout during the reset / retries give enough time to wait for the device to restart and reconnect to the PC in bootloader mode.

I then switched back to master and increased the timeout again somewhat and it worked there again too
image

I've used native / in-mcu usb devices a lot over the years and often find it takes up to 3-5 seconds for some devices to be fully enumerated on some PC's. This can blow out even longer if it's being used in a VM or similar! That was my rationale for initially suggesting 10 seconds for the timeout to try to capture all use cases.

I'm pretty sure this timeout only comes into play after esptool has sucessfully connected to the device initially to trigger the reset? If for instance I have a different app connected to the device to start with, esptool immediately exits with a A fatal error occurred: Could not open /dev/ttyACM1, the port is busy or doesn't exist.

For situations like you mentioned previously during a test suite with a not responding device; perhaps the retry loop could check and wait for a more narrow range of errors, to continue to retry when the device is completely missing but error out faster if the device exists but doesn't respond?

@andrewleech
Copy link
Contributor Author

I also had a chance to test with S2 now, with current release esptool it fails in a similar way to the S3 for me

  File "/opt/esp/python_env/idf5.2_py3.10_env/lib/python3.10/site-packages/esptool/reset.py", line 58, in _setDTRandRTS
    fcntl.ioctl(self.port.fileno(), TIOCMSET, struct.pack("I", status))
BrokenPipeError: [Errno 32] Broken pipe

With this updated version (master + increased timeout), it still failed!

Looks to be a different problem, in some way though. The S2 takes a long time to reset after esptool runs the RTS/DTR reset sequence, the USB bus / connection flashes up and down a couple of times before leaving the device reset back in application mode again, not in the bootloader. At this point, the new reset logic seems to just retry again, resetting it again, after which it still ends up back in application mode.... until timeout is hit.

I tried adding --before=usb_reset to esptool command and confirmed it now uses USBJTAGSerialReset but this does absolutely nothing? Running this when the S2 is in application mode (micropython) doesn't reset the device at all for me.

Do you know if an application code needs to implement something specific for S2 / usb_serial_jtag_cdc hardware to handle RTS/DTR reboots to bootloader? I can't see anything in official S2 docs or Arduino HWCDC that suggests as such?

@igrr
Copy link
Member

igrr commented Jun 19, 2024

Do you know if an application code needs to implement something specific for S2 / usb_serial_jtag_cdc hardware to handle RTS/DTR reboots to bootloader?

Please note that the S2 doesn't have USB_SERIAL_JTAG peripheral, so usb_reset is not applicable to it.

@andrewleech
Copy link
Contributor Author

Please note that the S2 doesn't have USB_SERIAL_JTAG peripheral, so usb_reset is not applicable to it.

Oh whoops my bad, yes I was thinking it was "only USB_SERIAL_JTAG" rather than "only native stack", of course it's the Cx chips with only USB_SERIAL_JTAG.

@radimkarnis
Copy link
Collaborator

I've used native / in-mcu usb devices a lot over the years and often find it takes up to 3-5 seconds for some devices to be fully enumerated on some PC's. This can blow out even longer if it's being used in a VM or similar! That was my rationale for initially suggesting 10 seconds for the timeout to try to capture all use cases.

Thanks for the explanation, we will consider extending the timeout to cover these cases!

I'm pretty sure this timeout only comes into play after esptool has sucessfully connected to the device initially to trigger the reset? If for instance I have a different app connected to the device to start with, esptool immediately exits with a A fatal error occurred: Could not open /dev/ttyACM1, the port is busy or doesn't exist.

That's right, if an error happens during the first instance of the port being open, this retry mechanism is not applied.

peterdragun pushed a commit to peterdragun/esptool that referenced this pull request Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants