Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - hiryu

Pages: [1]
1
Firmware / Re: Flashing corrupted BMC firmware
« on: November 18, 2023, 05:20:04 pm »
Seems like the BMC Write Protect switch was possible just _slightly_ off... It's the only thing I touched that fixed the issue.

As of around 2 hours, my Talos II system is in its new colo and it's working fine!

Hopefully if someone runs into the same or similar issue that this is able to help them!

2
Firmware / Re: Flashing corrupted BMC firmware
« on: November 17, 2023, 09:28:56 am »
Yesterday, it started working again completely unexpectedly.

Long story short, I was having trouble flashing the BMC chip over serial. Seems I couldn't write to it. I unplugged the system, set the BMC write protect to off (which I hadn't tried yet), but flashrom assumed the BMC chip was unavailable for update (from the last failed attempt at flashing it, which had also happened to me after my other failed attempts to flash it), so I unplugged the system, set the BMC write protect back on and FPGA RUN/RESET back to RUN, plugged it back in (in an attempt to "unlock" the chip)... And before I had the opportunity to unplug it and toggle both switches back to attempt using flashrom again, I had noticed that the BMC firmware had actually booted up!

It was only by luck that I was monitoring the BMC over serial and had even noticed it had started working again.

The only difference from the other attempts is that this was the first time I had attempted to toggle the BMC write protect switch (I had always toggled the FPGA RUN/RESET switch as documented here https://wiki.raptorcs.com/wiki/Debricking_the_BMC#Flash_new_BMC_firmware_via_serial_port_.28Open_Source_Method.29

As I had recently had the system shipped back to me from a colo facility... Starting to think this tiny BMC write protect switch had been jostled very slightly in the trip. I'll try removing and plugging the power back in at least a few more times today and report back on the results.

Before it started working, I had decided to try and re-seat the BMC flash chip. I didn't have enough time to really work on it and the latch was pretty secure... And it started working before I was able to go back and make a serious attempt.

Anyway, hopefully it stays working. I'll keep this thread updated with the ultimate conclusion for anyone else running into this issue.

3
Firmware / Flashing corrupted BMC firmware
« on: November 16, 2023, 09:45:54 am »
Plugging in my Talos II board, I was only seeing this periodically over and over on the serial port:
Code: [Select]
DRAM Init-V12-DDR4
0abc1-4Gb-Done
Read margin-DL:0.3607/DH:0.3803 CK (min:0.30)

Only some of the numbers on the last line would change.

I decided to try this method:
https://wiki.raptorcs.com/wiki/Debricking_the_BMC#Flash_new_BMC_firmware_via_serial_port_.28Open_Source_Method.29

Running this all night (from my Blackbird), I am getting these messages:
Code: [Select]
./flashrom --verbose --programmer 'ast2400:serial=/dev/ttyUSB0,cpu=halt,spibus=0' -c MX25L25635F/MX25L25645E/MX25L25665E -w image-bmc
flashrom d8f8f68 on Linux 6.5.11 (ppc64le)
flashrom is free software, get the source code at https://flashrom.org

flashrom was built with libpci 3.7.0, GCC 11.4.0, little endian
Command line (7 args): flashrom --verbose --programmer ast2400:serial=/dev/ttyUSB0,cpu=halt,spibus=0 -c MX25L25635F/MX25L25645E/MX25L25665E -w image-bmc
Calibrating delay loop... OS timer resolution is 1 usecs, 1887M loops per second, 10 myus = 10 us, 100 myus = 100 us, 1000 myus = 997 us, 10000 myus = 9992 us, 4 myus = 4 us, OK.
Initializing ast2400 programmer
No password specified with aspeed_vendor_backdoor_password, falling back to default.
Sending vendor serial backdoor password... done.
Waiting for response... done.
Detected 115200 baud interface
Configuring P2A bridge for SCU access
Configuring P2A bridge for WDT access
Configuring P2A bridge for SMC access
Enabling CE0 write
Using CE0 offset 0x00000000
The following protocols are supported: SPI.
Probing for Macronix MX25L25635F/MX25L25645E/MX25L25665E, 32768 kB: probe_spi_rdid_generic: id1 0xc2, id2 0x2019
Found Macronix flash chip "MX25L25635F/MX25L25645E/MX25L25665E" (32768 kB, SPI) on ast2400.
Chip status register is 0x00.
Chip status register: Status Register Write Disable (SRWD, SRP, ...) is not set
Chip status register: Bit 6 is not set
Chip status register: Block Protect 3 (BP3) is not set
Chip status register: Block Protect 2 (BP2) is not set
Chip status register: Block Protect 1 (BP1) is not set
Chip status register: Block Protect 0 (BP0) is not set
Chip status register: Write Enable Latch (WEL) is not set
Chip status register: Write In Progress (WIP/BUSY) is not set
This chip may contain one-time programmable memory. flashrom cannot read
and may never be able to write it, hence it may not be able to completely
clone the contents of this chip (see man page for details).
Switched to 4-bytes addressing mode.
Reading old flash chip contents... done.
Erasing and writing flash chip... Trying erase function 0...  0%0x000000-0x000fff:EFAILED at 0x00000000! Expected=0xff, Found=0xf8, failed byte count from 0x00000000-0x00000fff: 0xf98
ERASE FAILED!
Reading current flash chip contents... 32%

That seems bad... Is there anything I can do different from my side or do I need to replace the flash ship?


4
Firmware / Re: annoying missing sshd corner case with manually edited users
« on: February 21, 2020, 03:25:45 pm »
Here's the code to compile in order to disable the FPGA on a Blackbird system:
#include <stdint.h>
int main() {
    uint32_t* gpio_ctl_reg = 0x1e780024;
    uint32_t* gpio_data_reg = 0x1e780020;

    *gpio_ctl_reg |= 0x00010000;
    *gpio_data_reg &= ~0x00010000;
    return 0;
}

Otherwise the directions here work:
https://wiki.raptorcs.com/wiki/Debricking_the_BMC/Watchdog

5
Firmware / Re: annoying missing sshd corner case with manually edited users
« on: February 21, 2020, 03:24:17 pm »
So I decided to try and reflash the BMC using the risky technique described under known issues:
https://wiki.raptorcs.com/wiki/Talos_II/Firmware/2.00/Release_Notes

I removed /etc/passwd and /etc/group and immediately rebooted... It didn't work.

The good news is that after 4-5 hours, I finally figured out how to make tftpboot work from u-boot. The instructions on the wiki aren't 100% correct.

I'd like to be able to provide what needs to be fixed.

2 things from the top of my head:
a. Disabling the FPGA watchdog for the Blackbird isn't the same as the Talos. The procedure is the same, but the code is different as the BB has differing addresses.
b. The instructions to download and boot the BMC image aren't quite right.

6
Firmware / annoying missing sshd corner case with manually edited users
« on: February 21, 2020, 12:15:33 am »
After some back and forth on twitter, the notes have been updated:
https://twitter.com/RaptorCompSys/status/1230723235961917440

But I will give a run down on how this seemed to occur to me.
a. /etc/passwd- was giving me a random "stale file handle" issue... Which was preventing me from writing to /etc/passwd, as /etc/passwd could not be backed up to /etc/passwd-
b. Restarting the BMC fixed this so this "stale file handle" issue is intermittent?
c. Ultimately the issue here is that /etc/passwd is persistent and I had added a user. The new version of the file which can be found here: /run/initramfs/ro/etc/passwd, has the sshd user.

From here... it's clear that useradd is _really_ broken. It complains:
useradd: PAM: Permission denied

Probably needs to be built without PAM support?

adduser won't give you a list of options it accepts and it's the busybox version so who knows? Turns out it seems to _mostly_ have parity with Debian 10's adduser.

Here is how I ultimately was able to get this working:
1. addgroup --system sshd
2. adduser --system --home /var/run/sshd --shell /bin/false sshd (ignore the error about the sshd group)
3. usermod -g sshd sshd
4. chown root:root /var/run/sshd

Theoretically and ideally, the above steps should be performed BEFORE the upgrade to avoid having to hook up a serial cable.

My steps won't provide the same UID/GID as in the release notes, but will use the next available system UID/GID, which will work just as well. You could also modify the user or add some CLI switches to match Raptor's settings.

(edited to make minor fix to step 2 adduser command)

Pages: [1]