Author Topic: Flashing corrupted BMC firmware  (Read 2761 times)

hiryu

  • Newbie
  • *
  • Posts: 6
  • Karma: +2/-0
    • View Profile
Flashing corrupted BMC firmware
« on: November 16, 2023, 09:45:54 am »
Plugging in my Talos II board, I was only seeing this periodically over and over on the serial port:
Code: [Select]
DRAM Init-V12-DDR4
0abc1-4Gb-Done
Read margin-DL:0.3607/DH:0.3803 CK (min:0.30)

Only some of the numbers on the last line would change.

I decided to try this method:
https://wiki.raptorcs.com/wiki/Debricking_the_BMC#Flash_new_BMC_firmware_via_serial_port_.28Open_Source_Method.29

Running this all night (from my Blackbird), I am getting these messages:
Code: [Select]
./flashrom --verbose --programmer 'ast2400:serial=/dev/ttyUSB0,cpu=halt,spibus=0' -c MX25L25635F/MX25L25645E/MX25L25665E -w image-bmc
flashrom d8f8f68 on Linux 6.5.11 (ppc64le)
flashrom is free software, get the source code at https://flashrom.org

flashrom was built with libpci 3.7.0, GCC 11.4.0, little endian
Command line (7 args): flashrom --verbose --programmer ast2400:serial=/dev/ttyUSB0,cpu=halt,spibus=0 -c MX25L25635F/MX25L25645E/MX25L25665E -w image-bmc
Calibrating delay loop... OS timer resolution is 1 usecs, 1887M loops per second, 10 myus = 10 us, 100 myus = 100 us, 1000 myus = 997 us, 10000 myus = 9992 us, 4 myus = 4 us, OK.
Initializing ast2400 programmer
No password specified with aspeed_vendor_backdoor_password, falling back to default.
Sending vendor serial backdoor password... done.
Waiting for response... done.
Detected 115200 baud interface
Configuring P2A bridge for SCU access
Configuring P2A bridge for WDT access
Configuring P2A bridge for SMC access
Enabling CE0 write
Using CE0 offset 0x00000000
The following protocols are supported: SPI.
Probing for Macronix MX25L25635F/MX25L25645E/MX25L25665E, 32768 kB: probe_spi_rdid_generic: id1 0xc2, id2 0x2019
Found Macronix flash chip "MX25L25635F/MX25L25645E/MX25L25665E" (32768 kB, SPI) on ast2400.
Chip status register is 0x00.
Chip status register: Status Register Write Disable (SRWD, SRP, ...) is not set
Chip status register: Bit 6 is not set
Chip status register: Block Protect 3 (BP3) is not set
Chip status register: Block Protect 2 (BP2) is not set
Chip status register: Block Protect 1 (BP1) is not set
Chip status register: Block Protect 0 (BP0) is not set
Chip status register: Write Enable Latch (WEL) is not set
Chip status register: Write In Progress (WIP/BUSY) is not set
This chip may contain one-time programmable memory. flashrom cannot read
and may never be able to write it, hence it may not be able to completely
clone the contents of this chip (see man page for details).
Switched to 4-bytes addressing mode.
Reading old flash chip contents... done.
Erasing and writing flash chip... Trying erase function 0...  0%0x000000-0x000fff:EFAILED at 0x00000000! Expected=0xff, Found=0xf8, failed byte count from 0x00000000-0x00000fff: 0xf98
ERASE FAILED!
Reading current flash chip contents... 32%

That seems bad... Is there anything I can do different from my side or do I need to replace the flash ship?


atomicdog

  • Newbie
  • *
  • Posts: 26
  • Karma: +3/-0
    • View Profile
Re: Flashing corrupted BMC firmware
« Reply #1 on: November 16, 2023, 12:37:45 pm »

I don't know why flashing the chip isn't working but it may not be corrupted.
I was getting the same DRAM Init-V12-DDR4 errors and after removing the chip and re-seating it my Talos-II booted up.

hiryu

  • Newbie
  • *
  • Posts: 6
  • Karma: +2/-0
    • View Profile
Re: Flashing corrupted BMC firmware
« Reply #2 on: November 17, 2023, 09:28:56 am »
Yesterday, it started working again completely unexpectedly.

Long story short, I was having trouble flashing the BMC chip over serial. Seems I couldn't write to it. I unplugged the system, set the BMC write protect to off (which I hadn't tried yet), but flashrom assumed the BMC chip was unavailable for update (from the last failed attempt at flashing it, which had also happened to me after my other failed attempts to flash it), so I unplugged the system, set the BMC write protect back on and FPGA RUN/RESET back to RUN, plugged it back in (in an attempt to "unlock" the chip)... And before I had the opportunity to unplug it and toggle both switches back to attempt using flashrom again, I had noticed that the BMC firmware had actually booted up!

It was only by luck that I was monitoring the BMC over serial and had even noticed it had started working again.

The only difference from the other attempts is that this was the first time I had attempted to toggle the BMC write protect switch (I had always toggled the FPGA RUN/RESET switch as documented here https://wiki.raptorcs.com/wiki/Debricking_the_BMC#Flash_new_BMC_firmware_via_serial_port_.28Open_Source_Method.29

As I had recently had the system shipped back to me from a colo facility... Starting to think this tiny BMC write protect switch had been jostled very slightly in the trip. I'll try removing and plugging the power back in at least a few more times today and report back on the results.

Before it started working, I had decided to try and re-seat the BMC flash chip. I didn't have enough time to really work on it and the latch was pretty secure... And it started working before I was able to go back and make a serious attempt.

Anyway, hopefully it stays working. I'll keep this thread updated with the ultimate conclusion for anyone else running into this issue.

hiryu

  • Newbie
  • *
  • Posts: 6
  • Karma: +2/-0
    • View Profile
Re: Flashing corrupted BMC firmware
« Reply #3 on: November 18, 2023, 05:20:04 pm »
Seems like the BMC Write Protect switch was possible just _slightly_ off... It's the only thing I touched that fixed the issue.

As of around 2 hours, my Talos II system is in its new colo and it's working fine!

Hopefully if someone runs into the same or similar issue that this is able to help them!