Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poed gets wedged and cannot re-sync #18

Open
rothcar opened this issue Nov 19, 2021 · 2 comments
Open

poed gets wedged and cannot re-sync #18

rothcar opened this issue Nov 19, 2021 · 2 comments

Comments

@rothcar
Copy link
Contributor

rothcar commented Nov 19, 2021

I had to insert some debug statements into the driver to illustrate this, but intermittently some of the devices get into a state where the serial link to the poe chipset breaks down and cannot sync. A reset of poed does not fix the issue and a 'restore' (poe chipset factory reset) also does not fix the issue. Anecdotally I have found that rebooting the device can correct the issue.

Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR: Failed to get system power bank
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR: MSG: 00 00 00 00 00 00 00 00 00 00 00 00 00 03 af
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR: Tx key is 2, Rx key should be 82, but received 0
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR: MSG: 00 00 00 00 00 00 00 00 00 00 00 00 00 03 af
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR: Tx key is 2, Rx key should be 82, but received 0
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR: MSG: 00 00 00 00 00 00 00 00 00 00 00 00 00 03 af
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR: Tx key is 2, Rx key should be 82, but received 0
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR: MSG: 00 00 00 00 00 00 00 00 00 00 00 00 00 03 af
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR: Tx key is 2, Rx key should be 82, but received 0
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR: MSG: 00 00 00 00 00 00 00 00 00 00 00 00 00 03 af
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR: Tx key is 2, Rx key should be 82, but received 0
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR: MSG: 00 00 00 00 00 00 00 00 00 00 00 00 00 03 af
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR: Tx key is 2, Rx key should be 82, but received 0
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR: Traceback (most recent call last):
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR:   File "/opt/poeagent/drivers/poe_driver_pd69200.py", line 133, in _communicate
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR:     self._check_rx_msg(rx_msg, tx_msg)
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR:   File "/opt/poeagent/drivers/poe_driver_pd69200.py", line 112, in _check_rx_msg
Nov 19 01:03:40 lab05-ihm-infra-sw1 poed.py[1396]: ERR:     raise RuntimeError("Key field in Tx/Rx message is mismatch")
@leonchiang
Copy link
Contributor

leonchiang commented Nov 30, 2021

We add following debug messages into driver:

  • Routine runtime configuration update loop start/finish.
  • POE chip communicate check RX data failed,

It will print TX/RX raw data and retry times & retry data.

We made 3 test cases attempt to trigger POE chip breaks down:

test1. POE Agent restart loop

  1. start POE agent --> delay 8 second --> stop POE agent --> delay 2 second --> loop
  2. start POE agent --> delay 15 second --> stop POE agent --> delay 2 second --> loop
  3. start POE agent --> delay 25 second --> stop POE agent --> delay 2 second --> loop

test2. Same flow as test1 case, but decrease delay time between TX/RX commands from 30ms to 10ms.
In POE chip spec, the TX/RX and time between commands must delay at least 30ms, this case are over spec.

test3. Only start POE Agent (TX/RX delay time still 10ms), and run another script to execute following commands:
poecli set -p 1 -o 0x7530
porcli show -a
sleep 1

Above commands will set power limit ,then polling all status from POE chip, and delay 1 second, these test case cause some errors in POE chip communicate function,
then the function will retry TX/RX, after resend the command, it will success in all case,
we did not encounter error like your log (it seems to be I2C bus failure), we need more details steps to reproduce this issue.

@leonchiang
Copy link
Contributor

leonchiang commented Dec 2, 2021

We add an new patch to completes match the PD69200 datasheet's I2C protocol timing requirements, it will
include in new devel-0001 branch, if you want to try it, please check devel-0001 branch tag:
https://github.com/leonchiang/poed/releases/tag/issue_fix_0001b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants