Raptor Computing Systems Hardware > Talos II

Resetting Talos II to "Factory Settings"

(1/2) > >>

bloudraak:
Is there an easy way to reset an Talos II back to factory settings so I can go through the setup process again?

I changed the passwords per documentation but can't SSH to OpenBMC, nor use the website to login.

ClassicHasClass:
For a lost BMC password, I think your only solution is to open it up and connect a terminal to the BMC's serial port.

Depending on the task you need to do, though, you may be able to do it with ipmitool when the OS is booted.

bloudraak:
I haven't installed the OS on it yet. :(

Just got the serial cables via the mail and see what it produces.

ClassicHasClass:
You can use ipmitool from the Petitboot shell too.

bloudraak:
I wasn't able to use it. Just getting "command not found". That said, I think the issue is something different. The error message I ended up getting never displays on the monitor, only the serial cable.

When attempting to install the OS, I get the following error after the reboot.


--- Code: ---cpu 0x3b: Vector: 380 (Data Access Out of Range) at [c000201fdd29b620]
    pc: c0000000001d05f0: __free_pages+0x10/0x50
    lr: c000000000123c24: dma_direct_free_pages+0x54/0x90
    sp: c000201fdd29b8b0
   msr: 900000000280b033
   dar: c04240000000dcb4
  current = 0xc000201fdd244200
  paca    = 0xc000201fff704f00   irqmask: 0x03   irq_happened: 0x01
    pid   = 1002, comm = init
Linux version 5.5.0-openpower1 (root@raptor-build-public-staging-01) (gcc version 6.5.0 (Buildroot 2019.05.3-06769-g7bdd570165)) #2 SMP Thu Feb 20 02:19:47 UTC 2020
enter ? for help
[c000201fdd29b8b0] c000000000123c24 dma_direct_free_pages+0x54/0x90 (unreliable)
[c000201fdd29b8d0] c000000000038728 dma_iommu_free_coherent+0x98/0xc0
[c000201fdd29b920] c000000000123020 dma_free_attrs+0x100/0x110
[c000201fdd29b970] c0000000001d9bf4 dma_pool_destroy+0x174/0x200
[c000201fdd29ba10] c00800000e1417e8 _base_release_memory_pools+0x1e0/0x498 [mpt3sas]
[c000201fdd29baa0] c00800000e14b428 mpt3sas_base_detach+0x40/0x160 [mpt3sas]
[c000201fdd29bb10] c00800000e15bb5c scsih_shutdown+0xc4/0x110 [mpt3sas]
[c000201fdd29bb70] c0000000003cda10 pci_device_shutdown+0x50/0xc0
[c000201fdd29bba0] c00000000064a908 device_shutdown+0x1f8/0x330
[c000201fdd29bc40] c0000000000cfe2c kernel_restart_prepare+0x4c/0x60
[c000201fdd29bc60] c0000000000cff50 kernel_restart+0x20/0xc0
[c000201fdd29bcd0] c0000000000d0370 __do_sys_reboot+0x1b0/0x2c0
[c000201fdd29be20] c00000000000b50c system_call+0x5c/0x68
--- Exception: c01 (System Call) at 00007fff89bd10a4
SP (7ffff94afef0) is in userspace

--- End code ---

When restarting the server


--- Code: ---[520051.754983] configfs-gadget gadget: high-speed config #1: c
[520053.170880] occ-hwmon occ-hwmon.1: OCC found, code level: op_occ_191023a
[520053.414668] occ-hwmon occ-hwmon.2: OCC found, code level: op_occ_191023a
[Disconnected]
[Connected]
 

 

--== Welcome to Hostboot hostboot-a2ddbf3/hbicore.bin ==--
 

  3.10730|secure|SecureROM valid - enabling functionality
  5.51749|Booting from SBE side 0 on master proc=00050000
  5.55317|ISTEP  6. 5 - host_init_fsi
  5.73355|ISTEP  6. 6 - host_set_ipl_parms
  6.03033|ISTEP  6. 7 - host_discover_targets
  7.79252|HWAS|PRESENT> DIMM[03]=A0A0A0A000000000
  7.79254|HWAS|PRESENT> Proc[05]=8800000000000000
  7.79255|HWAS|PRESENT> Core[07]=5645406555000000
  7.91166|ISTEP  6. 8 - host_update_master_tpm
  8.41908|SECURE|Security Access Bit> 0x0000000000000000
  8.41909|SECURE|Secure Mode Disable (via Jumper)> 0xC000000000000000
  8.41928|ISTEP  6. 9 - host_gard
 8.91332|HWAS|Applying GARD record for HUID=0x00030000 (Physical:/Sys0/Node0/DIMM0) due to 0x90000010
  8.91664|HWAS|Applying GARD record for HUID=0x00030000 (Physical:/Sys0/Node0/DIMM0) due to 0x90000010
  8.91676|HWAS|Deconfig HUID 0x00030000, Physical:/Sys0/Node0/DIMM0
  8.91704|HWAS|FUNCTIONAL> DIMM[03]=20A0A0A000000000
  8.91705|HWAS|FUNCTIONAL> Proc[05]=8800000000000000
  8.91707|HWAS|FUNCTIONAL> Core[07]=5645406555000000
  8.92233|ISTEP  6.11 - host_start_occ_xstop_handler
10.14337|ISTEP  6.12 - host_voltage_config
10.27450|ISTEP  7. 1 - mss_attr_cleanup
10.95124|ISTEP  7. 2 - mss_volt
11.19280|ISTEP  7. 3 - mss_freq
11.56893|ISTEP  7. 4 - mss_eff_config
12.21124|ISTEP  7. 5 - mss_attr_update
12.23273|ISTEP  8. 1 - host_slave_sbe_config
12.44698|ISTEP  8. 2 - host_setup_sbe
12.44873|ISTEP  8. 3 - host_cbs_start
12.49990|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
17.55574|ISTEP  8. 5 - host_attnlisten_proc
17.59473|ISTEP  8. 6 - host_p9_fbc_eff_config
17.60271|ISTEP  8. 7 - host_p9_eff_config_links
17.64173|ISTEP  8. 8 - proc_attr_update
17.64332|ISTEP  8. 9 - proc_chiplet_fabric_scominit
17.68790|ISTEP  8.10 - proc_xbus_scominit
18.97508|ISTEP  8.11 - proc_xbus_enable_ridi
18.99465|ISTEP  8.12 - host_set_voltages
19.08378|ISTEP  9. 1 - fabric_erepair
19.25325|ISTEP  9. 2 - fabric_io_dccal
20.01976|ISTEP  9. 3 - fabric_pre_trainadv
20.04370|ISTEP  9. 4 - fabric_io_run_training
20.20031|ISTEP  9. 5 - fabric_post_trainadv
20.20406|ISTEP  9. 6 - proc_smp_link_layer
20.20861|ISTEP  9. 7 - proc_fab_iovalid
20.49515|ISTEP  9. 8 - host_fbc_eff_config_aggregate
20.52556|ISTEP 10. 1 - proc_build_smp
21.60404|ISTEP 10. 2 - host_slave_sbe_update
22.65219|ISTEP 10. 4 - proc_cen_ref_clk_enable
22.70371|ISTEP 10. 5 - proc_enable_osclite
22.70484|ISTEP 10. 6 - proc_chiplet_scominit
22.78700|ISTEP 10. 7 - proc_abus_scominit
22.81221|ISTEP 10. 8 - proc_obus_scominit
22.81445|ISTEP 10. 9 - proc_npu_scominit
22.83694|ISTEP 10.10 - proc_pcie_scominit
22.92795|ISTEP 10.11 - proc_scomoverride_chiplets
22.93975|ISTEP 10.12 - proc_chiplet_enable_ridi
22.97791|ISTEP 10.13 - host_rng_bist
23.03426|ISTEP 10.14 - host_update_redundant_tpm
23.03943|ISTEP 11. 1 - host_prd_hwreconfig
23.30043|ISTEP 11. 2 - cen_tp_chiplet_init1
23.30748|ISTEP 11. 3 - cen_pll_initf
23.31247|ISTEP 11. 4 - cen_pll_setup
23.31704|ISTEP 11. 5 - cen_tp_chiplet_init2
23.34176|ISTEP 11. 6 - cen_tp_arrayinit
23.34678|ISTEP 11. 7 - cen_tp_chiplet_init3
23.35156|ISTEP 11. 8 - cen_chiplet_init
23.35625|ISTEP 11. 9 - cen_arrayinit
23.36093|ISTEP 11.10 - cen_initf
23.36704|ISTEP 11.11 - cen_do_manual_inits
23.37181|ISTEP 11.12 - cen_startclocks
23.37685|ISTEP 11.13 - cen_scominits
23.38151|ISTEP 12. 1 - mss_getecid
24.21273|ISTEP 12. 2 - dmi_attr_update
24.23695|ISTEP 12. 3 - proc_dmi_scominit
24.29667|ISTEP 12. 4 - cen_dmi_scominit
24.30138|ISTEP 12. 5 - dmi_erepair
24.36646|ISTEP 12. 6 - dmi_io_dccal
24.37048|ISTEP 12. 7 - dmi_pre_trainadv
24.37525|ISTEP 12. 8 - dmi_io_run_training
24.39397|ISTEP 12. 9 - dmi_post_trainadv
24.39866|ISTEP 12.10 - proc_cen_framelock
24.40360|ISTEP 12.11 - host_startprd_dmi
24.40765|ISTEP 12.12 - host_attnlisten_memb
24.41171|ISTEP 12.13 - cen_set_inband_addr
24.41584|ISTEP 13. 1 - host_disable_memvolt
24.58138|ISTEP 13. 2 - mem_pll_reset
24.62217|ISTEP 13. 3 - mem_pll_initf
24.66147|ISTEP 13. 4 - mem_pll_setup
24.69804|ISTEP 13. 6 - mem_startclocks
24.71370|ISTEP 13. 7 - host_enable_memvolt
24.73321|ISTEP 13. 8 - mss_scominit
25.30592|ISTEP 13. 9 - mss_ddr_phy_reset
25.40047|ISTEP 13.10 - mss_draminit
25.98234|ISTEP 13.11 - mss_draminit_training
28.04184|ISTEP 13.12 - mss_draminit_trainadv
28.28842|ISTEP 13.13 - mss_draminit_mc
28.32749|ISTEP 14. 1 - mss_memdiag
33.56260|ISTEP 14. 2 - mss_thermal_init
33.62293|ISTEP 14. 3 - proc_pcie_config
33.67115|ISTEP 14. 4 - mss_power_cleanup
33.67716|ISTEP 14. 5 - proc_setup_bars
33.71607|ISTEP 14. 6 - proc_htm_setup
33.72956|ISTEP 14. 7 - proc_exit_cache_contained
33.76993|ISTEP 15. 1 - host_build_stop_image
37.33691|ISTEP 15. 2 - proc_set_pba_homer_bar
37.40133|ISTEP 15. 3 - host_establish_ex_chiplet
37.43321|ISTEP 15. 4 - host_start_stop_engine
37.46371|ISTEP 16. 1 - host_activate_master
38.70238|ISTEP 16. 2 - host_activate_slave_cores
38.89911|ISTEP 16. 3 - host_secure_rng
38.88787|ISTEP 16. 4 - mss_scrub
38.90828|ISTEP 16. 5 - host_load_io_ppe
38.94111|ISTEP 16. 6 - host_ipl_complete
39.32079|ISTEP 18.11 - proc_tod_setup
39.44248|ISTEP 18.12 - proc_tod_init
39.43995|ISTEP 20. 1 - host_load_payload
40.13808|ISTEP 20. 2 - host_load_hdat
41.63064|ISTEP 21. 1 - host_runtime_setup
53.10080|htmgt|OCCs are now running in ACTIVE state
58.33571|ISTEP 21. 2 - host_verify_hdat
58.37178|ISTEP 21. 3 - host_start_payload
[   59.225278060,5] OPAL skiboot-9858186 starting...
[   59.225281070,7] initial console log level: memory 7, driver 5
[   59.225283107,6] CPU: P9 generation processor (max 4 threads/core)
[   59.225284921,7] CPU: Boot CPU PIR is 0x0834 PVR is 0x004e1203
[   59.225287534,7] OPAL table: 0x30103830 .. 0x30103e10, branch table: 0x30002000
[   59.225290544,7] Assigning physical memory map table for nimbus
[   59.225293297,7] Parsing HDAT...
[   59.225294624,7] SPIRA-S found.
[   59.225296926,6] BMC #0: HW version 3, SW version 2, chip DD1.0
[   59.225457609,6] SP Family is ibm,ast2500,openbmc
[   59.225463830,7] LPC: IOPATH chip id = 0
[   59.225465150,7] LPC: FW BAR       = f0000000
[   59.225466678,7] LPC: MEM BAR      = e0000000
[   59.225468142,7] LPC: IO BAR       = d0010000
[   59.225469592,7] LPC: Internal BAR = c0012000
[   59.225482159,7] LPC UART: base addr = 3f8 (3f8) size = 1 clk = 1843200, baud = 115200
[   59.225484833,7] LPC: BT [0, 0] sms_int: 0, bmc_int: 0
[   59.227438048,5] HDAT I2C: found e3p1 - unknown@1c dp:ff (ff:)
[   59.227553256,5] HDAT I2C: found e3p1 - unknown@1d dp:ff (ff:)
[   59.227606613,5] HDAT I2C: found e3p0 - unknown@19 dp:ff (ff:)
[   59.227659260,5] HDAT I2C: found e3p1 - unknown@1e dp:ff (ff:)
[   59.227704460,5] HDAT I2C: found e3p0 - unknown@1b dp:ff (ff:)
[   59.227754386,5] HDAT I2C: found e3p1 - unknown@1f dp:ff (ff:)
[   59.227819475,5] HDAT I2C: found e3p0 - unknown@1a dp:ff (ff:)
[   59.227898145,5] HDAT I2C: found e3p0 - unknown@18 dp:ff (ff:)
[   59.228269121,5] HDAT I2C: found e3p1 - unknown@1c dp:ff (ff:)
[   59.228347500,5] HDAT I2C: found e3p1 - unknown@1d dp:ff (ff:)
[   59.228398443,5] HDAT I2C: found e3p0 - unknown@19 dp:ff (ff:)
Petitboot (0ed84c0-p94177c1)                         T2P9D01 REV 1.00 A1000645
──────────────────────────────────────────────────────────────────────────────
 

  System information
  System configuration
  System status log
  Language
  Rescan devices
  Retrieve config from URL
  Plugins (0)
*Exit to shell         
 

 

 

 

 

 

 

 

 

 

──────────────────────────────────────────────────────────────────────────────
Enter=accept, e=edit, n=new, x=exit, l=language, g=log, h=help
[enP4p1s0f1] Probing from base tftp://192.168.0.1/pxelinux.cfg/

--- End code ---

At this point I wait for the Debian installer (in the USB at the back of the device) to show up, and then select "Expert Installation". After that it prints that a SIGTERM is received, and it spits out the following.


--- Code: ---cpu 0x3b: Vector: 380 (Data Access Out of Range) at [c000201fdd29b620]
    pc: c0000000001d05f0: __free_pages+0x10/0x50
    lr: c000000000123c24: dma_direct_free_pages+0x54/0x90
    sp: c000201fdd29b8b0
   msr: 900000000280b033
   dar: c04240000000dcb4
  current = 0xc000201fdd244200
  paca    = 0xc000201fff704f00   irqmask: 0x03   irq_happened: 0x01
    pid   = 1002, comm = init
Linux version 5.5.0-openpower1 (root@raptor-build-public-staging-01) (gcc version 6.5.0 (Buildroot 2019.05.3-06769-g7bdd570165)) #2 SMP Thu Feb 20 02:19:47 UTC 2020
enter ? for help
[c000201fdd29b8b0] c000000000123c24 dma_direct_free_pages+0x54/0x90 (unreliable)
[c000201fdd29b8d0] c000000000038728 dma_iommu_free_coherent+0x98/0xc0
[c000201fdd29b920] c000000000123020 dma_free_attrs+0x100/0x110
[c000201fdd29b970] c0000000001d9bf4 dma_pool_destroy+0x174/0x200
[c000201fdd29ba10] c00800000e1417e8 _base_release_memory_pools+0x1e0/0x498 [mpt3sas]
[c000201fdd29baa0] c00800000e14b428 mpt3sas_base_detach+0x40/0x160 [mpt3sas]
[c000201fdd29bb10] c00800000e15bb5c scsih_shutdown+0xc4/0x110 [mpt3sas]
[c000201fdd29bb70] c0000000003cda10 pci_device_shutdown+0x50/0xc0
[c000201fdd29bba0] c00000000064a908 device_shutdown+0x1f8/0x330
[c000201fdd29bc40] c0000000000cfe2c kernel_restart_prepare+0x4c/0x60
[c000201fdd29bc60] c0000000000cff50 kernel_restart+0x20/0xc0
[c000201fdd29bcd0] c0000000000d0370 __do_sys_reboot+0x1b0/0x2c0
[c000201fdd29be20] c00000000000b50c system_call+0x5c/0x68
--- Exception: c01 (System Call) at 00007fff89bd10a4
SP (7ffff94afef0) is in userspace

--- End code ---

What jumps out is the following:


--- Code: ---  8.91332|HWAS|Applying GARD record for HUID=0x00030000 (Physical:/Sys0/Node0/DIMM0) due to 0x90000010
  8.91664|HWAS|Applying GARD record for HUID=0x00030000 (Physical:/Sys0/Node0/DIMM0) due to 0x90000010

--- End code ---

What is this about?

The box consists of:


* Talos™ II Mainboard (Board Only)
* 2U Heatsink Assembly for POWER9 CPUs
* LSI 9300-8i 8-port Internal SAS 3.0 HBA
* M393A4K40BB2-CTD - Samsung 1x 32GB DDR4-2666 RDIMM PC4-21300V-R Dual Rank x4 Module (x8 = 256GB Memory)
* Supermicro CSE-836BE1C-R1K03B Server Chassis 3U Rackmount
* 3TB SAS3 drives (x16)
From the boot sequence, I can see it detects the SAS3 hard disks as well as the memory.

Any suggestions would be appreciated. I can't find much on the internet about the error and why it may occur. And I can't find any documentation of commands to run when the initial IPL is complete but before attempting to install the operating system. Is there any? Where can I learn more about

I did create a ticket (#367172) with support, but given that they usually take weeks to respond so I don't have much faith that I'll have this resolved anytime soon through that channel. 

Navigation

[0] Message Index

[#] Next page

Go to full version