Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e8450 acting strange after serial recovery. Missing 5ghz radio and new ssh key every reboot. #198

Open
mumenwrt opened this issue Oct 12, 2024 · 16 comments

Comments

@mumenwrt
Copy link

Hello. I've been racking my brain trying to get my e8450 router to work after sudden OKD. I have 2 e8450 routers that were affected by OKD. I was able to restore both with a serial connection and directions found on the openwrt wiki page. Router A appears to be working fine after serial recovery.

However, Router B is not, it is missing the 5ghz radio and has strange behavior of setting up a new network every time it reboots. For example, on my windows pc every time Router B reboots it will create a new network, I am on network 24. Also, a new ssh key is also generated each time Router B reboots. So if I try to ssh into Router B, I have to delete the old ssh key then I will be able to connect.

I have tried every combination of openwrt versions from 22.03.x through 23.05.x and even snapshot builds, but to no avail. I was thinking of reverting Router B back to stock firmware, but I do not have the backup bootchain mtd bin files.

Any assistance would be greatly appreciated.

Attached below are system logs, kernel logs, mtdblock2 hexdump, and fw_printenv output. Outputs from current installed version OpenWrt 23.05.5 r24106-10cc5fcd00. Let me know if any other info is needed.

env_e8450.txt
kernel_log_e8450.txt
mtdblock2_e8450.txt
systemlog_e8450.txt

@hufflepuffhavoc
Copy link

Im experiencing the same issue 5ghz radio doesnt work after serial recovery it seems like it has something to do with the pci. Also another thing ive noticed is i cant install the 1.1.3 uni installer when it reboots i get stuck at the deferred probe in the logs that i saw on ssh but after a reboot it bricks the device again. Hopefully theres a fix for it. I can provide logs if needed although idk wht it means at all.

@mumenwrt
Copy link
Author

Hopefully, you did a back up of the stock RT3200 or e8450 firmware bootchain (mtd0 - mtd3). You can then use these files to flash your bricked router to back to stock firmware. The procedure for this is on the main page of dangowrt's github. At which point, you should be able to flash to Openwrt using Release v1.0.2 ubi installer dangowrt made.

If you have a back up of the bootchain files to the stock e8450/rt3200 firmware (before upgrading to openwrt), I would greatly appreciate if you could share them. I am also in need of them.

@hufflepuffhavoc
Copy link

I am able to get it working on 1.0.2 as thts wht i used for the inital serial recovery its just the 5ghz mainly thats the concern for me. and unfortunately no i never made the backup of stock firmware.

@mumenwrt
Copy link
Author

mumenwrt commented Oct 18, 2024

I'm in the same boat. From this post #137 , I believe our ubi or flash layout is corrupted and we need to flash a properly working mtdx files on to them again. The easiest way to fix this from all I've read is to restore the router back to it's stock firmware state and then flash up to the appropriate openwrt ubi layout.

Option 2: If you have a second working router, you could back up those bootchain files and flash to your bricked router. However, you will have to edit the mtd2 file to reflect the original mac address of the bricked router.

Unfortunately, I did not save those bootchain files nor do I have access to a second working router...

@hufflepuffhavoc
Copy link

Hopefully someone sees this and could point us to that stock bootchain backup. I was too excited to get on openwrt for the first time a backup was the least of my concerns. Oh well

@dangowrt
Copy link
Owner

This issue sounds like you have wiped the factory partition or UBI volume. This can happen quite easily if you ignore the warning and flash a snapshot image or load snapshot initramfs without having run the updated v1.1.3 installer. Also moving back won't help then. Now, in order to fix this, you will need a backup of the partition content, or re-create it from that data taken from a donor device by inserting the correct MAC addresses using a hexeditor.
In short:

  • OpenWrt 22.03 and 23.05 expect the factory data to be inside an MTD partition called factory. Due to the inherent unreliability of the SPI-NAND chip, MediaTek uses a proprietary method (called NMBM) for bad block relocation, but no scrubbing is done even if the BCH engine reports correctable errors. In OpenWrt with UBI layout we avoid using NMBM as UBI can do the job much better. Yet, in order to remain (more or less) compatible with the stock firmware, the factory data was still kept inside an MTD partition.

  • OpenWrt snapshots and the upcoming 24.10 release expect the factory data to be inside a UBI volume called factory. Installer version v1.1.x carry out the relocation, extract the data from the former factory MTD partition and create a UBI volume for it. Also also most parts of the bootloader can be moved into UBI the MTD layout is further simplified to only bl2 and mtd. This has the advantage that now all areas are under UBI supervision, that means automatic relocation and scrubbing being automatically triggered once BCH reports bit errors.

Now, on a device which experienced OKD (which is a bug in ARM TrustedFirmware-A bl2 which has been fixed a few months ago) you need to be careful to not wipe factory data by booting or installing a snapshot image on a device which previous run 23.05.x or earlier version of OpenWrt.

All versions of the installer do backups of the bootchain and move that into a UBI volume called boot_backup. So your first step would be the examing the content of the boot_backup volume by mounting it while running the respective recovery initramfs image matching the previously installed OpenWrt version:

mount -t ubifs ubi0:boot_backup /mnt
cd /mnt
ls -l

Depending on which version of the installer was used, what you are going to see there are either 4 files (mtd0, mtd1, mtd2, mtd3) for installer v1.0.x or 2 files (mtd0, mtd1) for installer v1.1.x.
If you see 4 files, the factory data is inside the file mtd2.
If you see 2 files, the factory data is inside the file mtd1 at offset 0x140000.

@hufflepuffhavoc
Copy link

Hi thanks for you reply. I personally after the first serial recovery (at which point i was on 23.05.3) have ran multiple different versions of the installer obviously anything 1.1.x never worked and would brick the device and i would load up everything that u can from the uboot whether its bl2 bl31 recovery and production( clearly i have no idea what i’m doing and was just trying different things to get tht 5ghz working). And so just to understand at this point will the examining backup partition bit be of any use still? Or a donor device is the only way?

@mumenwrt
Copy link
Author

When I installed openwrt on both routers I think I used v1.1.3 or whatever the latest one was available at the time (which are now removed from your github). That being said, I went ahead and checked my router A which appears to be working and it does have mtd0, mtd1, mtd2, and mtd3.

Although, I am unsure if Router A is stable or not, I will try to flash those files to the bricked router as I don't have many options since I did not backup bootchain files before flashing openwrt. I will update if it works or not.

@dangowrt
Copy link
Owner

And so just to understand at this point will the examining backup partition bit be of any use still?

It depends on what you find in those backups. If you have all 4 files in boot_backup, see if hexdump -C /dev/mtd2 looks like 22 76 ... ie. calibration data. If it looks like #UBI then it's not useful.

@dangowrt
Copy link
Owner

That being said, I went ahead and checked my router A which appears to be working and it does have mtd0, mtd1, mtd2, and mtd3.

If you have all 4 files in boot_backup that means you run installer v1.0.x (what ever you see in /dev/ doesn't mean anything in that regard). As you only run install v1.0.x you can not run snapshot images or upgrade to future OpenWrt 24.10.0 before running the v1.1.3 installer. Updating any part of the bootchain manually will not work, the installer does more than just that.

@mumenwrt
Copy link
Author

Yes, the boot_backup mtdx files were present on my Router A. I tried to use them to get back to stock firmware, but no luck. However, I was able to use the serial uart bootmenu and flash Router B with v1.0.2 sysupgrade itb file, preloader.bin and uboot.fip files from the openwrt wiki. The same method I used to recover Router A.

It looks like my bricked Router B is now a clone of Router A. I wonder why during my initial serial recovery worked on Router A but failed on Router B.

As you only run install v1.0.x you can not run snapshot images or upgrade to future OpenWrt 24.10.0 before running the v1.1.3 installer

I am interested in running OpenWrt 24.10 when it comes out. Are you saying I will not be able to upgrade in the future?

@dangowrt
Copy link
Owner

Are you saying I will not be able to upgrade in the future?

You are absolutely able to upgrade in future, or now, should you want to run snapshots. All you have to do is to flash the unsigned v1.1.3 installer, which will take care of converting the flash layout by moving fip and factory into UBI volumes.

@dangowrt
Copy link
Owner

It looks like my bricked Router B is now a clone of Router A.

You will have to edit the factory backup in a hex-editor and fix the MAC addresses...

@mumenwrt
Copy link
Author

Thanks for clearing up about the future update. I will edit the mac address at a later time. I wanted to know if i could get my router back to working order first.

@frycss frycss mentioned this issue Nov 23, 2024
@frycss
Copy link

frycss commented Nov 23, 2024

After flashing the wrong firmware installer, my router is missing the 5Ghz radio too. Now I'm trying to restore the stock firmware to start afresh, but I'm encountering this error:

# ubidetach -d 0
ubidetach: error!: cannot remove ubi0
           error 16 (Resource busy)

Any suggestions on how I can proceed from here?

Edit: solved by flashing openwrt-22.03.3-mediatek-mt7622-linksys_e8450-ubi-initramfs-recovery.itb first.

@zaventh
Copy link

zaventh commented Jan 15, 2025

I originally ran the v1.0.2 installer and flashed up to OpenWRT 23.05.4. All was working well. When looking to upgrade to 23.05.5 I read this warning on the OpenWRT wiki:

Users already running OpenWrt 23.05.x or older, or snapshots before 2024-02-15 also need to re-run the installer yet another time to move fip and factory partitions into UBI volumes before running snapshots after 2024-02-15 as well as upcoming releases. Upgrading to release 23.05.4 does NOT require that you re-run the installer.

And also:

Upgrading an UBI installation to new releases after 2024-02 (Includes ALL SNAPSHOTS, 24.10-SNAPSHOTs, 24.10.0-rcX releases and all releases in the foreseeable future) (emphasis mine)

I took this to mean 23.05.5 required the newer v1.1.3 installer and followed the instructions on that page to do so. Things mostly appeared to work as expected, except my 5ghz radio failed to load:

# dmesg | grep mt7915e
[    7.239671] mt7915e 0000:01:00.0: assign IRQ: got 146
[    7.244834] mt7915e 0000:01:00.0: enabling device (0000 -> 0002)
[    7.250930] mt7915e 0000:01:00.0: enabling bus mastering
[    7.502854] mt7915e 0000:01:00.0: HW/SW Version: 0x8a108a10, Build Time: 20220929104113a
[    7.695826] mt7915e 0000:01:00.0: WM Firmware Version: ____000000, Build Time: 20220929104145
[    7.749189] mt7915e 0000:01:00.0: WA Firmware Version: DEV_000000, Build Time: 20220929104205
[    7.865096] mt7915e 0000:01:00.0: eeprom load fail, use default bin
[    7.871536] mt7915e 0000:01:00.0: Direct firmware load for mediatek/mt7915_eeprom.bin failed with error -2
[    7.881244] mt7915e 0000:01:00.0: Falling back to sysfs fallback for: mediatek/mt7915_eeprom.bin
[    7.897652] mt7915e: probe of 0000:01:00.0 failed with error -12

Flashing back to 23.05.4 does not fix the issue. However, flashing to OpenWRT 24.10.0-rc5 does resolve it and I understand now there was never a reason to upgrade the bootloader for 23.05.5 and in fact it is not backwards compatible. Luckily, v24 RCs appear to be stable and functional enough for my use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants