Author Topic: Resetting Talos II to "Factory Settings"  (Read 9000 times)

bloudraak

  • Newbie
  • *
  • Posts: 12
  • Karma: +1/-0
    • View Profile
Resetting Talos II to "Factory Settings"
« on: April 05, 2020, 02:20:08 pm »
Is there an easy way to reset an Talos II back to factory settings so I can go through the setup process again?

I changed the passwords per documentation but can't SSH to OpenBMC, nor use the website to login.

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 467
  • Karma: +35/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: Resetting Talos II to "Factory Settings"
« Reply #1 on: April 06, 2020, 10:59:15 am »
For a lost BMC password, I think your only solution is to open it up and connect a terminal to the BMC's serial port.

Depending on the task you need to do, though, you may be able to do it with ipmitool when the OS is booted.

bloudraak

  • Newbie
  • *
  • Posts: 12
  • Karma: +1/-0
    • View Profile
Re: Resetting Talos II to "Factory Settings"
« Reply #2 on: April 14, 2020, 08:56:58 pm »
I haven't installed the OS on it yet. :(

Just got the serial cables via the mail and see what it produces.

ClassicHasClass

  • Sr. Member
  • ****
  • Posts: 467
  • Karma: +35/-0
  • Talospace Earth Orbit
    • View Profile
    • Floodgap
Re: Resetting Talos II to "Factory Settings"
« Reply #3 on: April 16, 2020, 10:56:50 am »
You can use ipmitool from the Petitboot shell too.

bloudraak

  • Newbie
  • *
  • Posts: 12
  • Karma: +1/-0
    • View Profile
Re: Resetting Talos II to "Factory Settings"
« Reply #4 on: April 18, 2020, 03:48:16 pm »
I wasn't able to use it. Just getting "command not found". That said, I think the issue is something different. The error message I ended up getting never displays on the monitor, only the serial cable.

When attempting to install the OS, I get the following error after the reboot.

Code: [Select]
cpu 0x3b: Vector: 380 (Data Access Out of Range) at [c000201fdd29b620]
    pc: c0000000001d05f0: __free_pages+0x10/0x50
    lr: c000000000123c24: dma_direct_free_pages+0x54/0x90
    sp: c000201fdd29b8b0
   msr: 900000000280b033
   dar: c04240000000dcb4
  current = 0xc000201fdd244200
  paca    = 0xc000201fff704f00   irqmask: 0x03   irq_happened: 0x01
    pid   = 1002, comm = init
Linux version 5.5.0-openpower1 (root@raptor-build-public-staging-01) (gcc version 6.5.0 (Buildroot 2019.05.3-06769-g7bdd570165)) #2 SMP Thu Feb 20 02:19:47 UTC 2020
enter ? for help
[c000201fdd29b8b0] c000000000123c24 dma_direct_free_pages+0x54/0x90 (unreliable)
[c000201fdd29b8d0] c000000000038728 dma_iommu_free_coherent+0x98/0xc0
[c000201fdd29b920] c000000000123020 dma_free_attrs+0x100/0x110
[c000201fdd29b970] c0000000001d9bf4 dma_pool_destroy+0x174/0x200
[c000201fdd29ba10] c00800000e1417e8 _base_release_memory_pools+0x1e0/0x498 [mpt3sas]
[c000201fdd29baa0] c00800000e14b428 mpt3sas_base_detach+0x40/0x160 [mpt3sas]
[c000201fdd29bb10] c00800000e15bb5c scsih_shutdown+0xc4/0x110 [mpt3sas]
[c000201fdd29bb70] c0000000003cda10 pci_device_shutdown+0x50/0xc0
[c000201fdd29bba0] c00000000064a908 device_shutdown+0x1f8/0x330
[c000201fdd29bc40] c0000000000cfe2c kernel_restart_prepare+0x4c/0x60
[c000201fdd29bc60] c0000000000cff50 kernel_restart+0x20/0xc0
[c000201fdd29bcd0] c0000000000d0370 __do_sys_reboot+0x1b0/0x2c0
[c000201fdd29be20] c00000000000b50c system_call+0x5c/0x68
--- Exception: c01 (System Call) at 00007fff89bd10a4
SP (7ffff94afef0) is in userspace

When restarting the server

Code: [Select]
[520051.754983] configfs-gadget gadget: high-speed config #1: c
[520053.170880] occ-hwmon occ-hwmon.1: OCC found, code level: op_occ_191023a
[520053.414668] occ-hwmon occ-hwmon.2: OCC found, code level: op_occ_191023a
[Disconnected]
[Connected]
 

 

--== Welcome to Hostboot hostboot-a2ddbf3/hbicore.bin ==--
 

  3.10730|secure|SecureROM valid - enabling functionality
  5.51749|Booting from SBE side 0 on master proc=00050000
  5.55317|ISTEP  6. 5 - host_init_fsi
  5.73355|ISTEP  6. 6 - host_set_ipl_parms
  6.03033|ISTEP  6. 7 - host_discover_targets
  7.79252|HWAS|PRESENT> DIMM[03]=A0A0A0A000000000
  7.79254|HWAS|PRESENT> Proc[05]=8800000000000000
  7.79255|HWAS|PRESENT> Core[07]=5645406555000000
  7.91166|ISTEP  6. 8 - host_update_master_tpm
  8.41908|SECURE|Security Access Bit> 0x0000000000000000
  8.41909|SECURE|Secure Mode Disable (via Jumper)> 0xC000000000000000
  8.41928|ISTEP  6. 9 - host_gard
 8.91332|HWAS|Applying GARD record for HUID=0x00030000 (Physical:/Sys0/Node0/DIMM0) due to 0x90000010
  8.91664|HWAS|Applying GARD record for HUID=0x00030000 (Physical:/Sys0/Node0/DIMM0) due to 0x90000010
  8.91676|HWAS|Deconfig HUID 0x00030000, Physical:/Sys0/Node0/DIMM0
  8.91704|HWAS|FUNCTIONAL> DIMM[03]=20A0A0A000000000
  8.91705|HWAS|FUNCTIONAL> Proc[05]=8800000000000000
  8.91707|HWAS|FUNCTIONAL> Core[07]=5645406555000000
  8.92233|ISTEP  6.11 - host_start_occ_xstop_handler
10.14337|ISTEP  6.12 - host_voltage_config
10.27450|ISTEP  7. 1 - mss_attr_cleanup
10.95124|ISTEP  7. 2 - mss_volt
11.19280|ISTEP  7. 3 - mss_freq
11.56893|ISTEP  7. 4 - mss_eff_config
12.21124|ISTEP  7. 5 - mss_attr_update
12.23273|ISTEP  8. 1 - host_slave_sbe_config
12.44698|ISTEP  8. 2 - host_setup_sbe
12.44873|ISTEP  8. 3 - host_cbs_start
12.49990|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
17.55574|ISTEP  8. 5 - host_attnlisten_proc
17.59473|ISTEP  8. 6 - host_p9_fbc_eff_config
17.60271|ISTEP  8. 7 - host_p9_eff_config_links
17.64173|ISTEP  8. 8 - proc_attr_update
17.64332|ISTEP  8. 9 - proc_chiplet_fabric_scominit
17.68790|ISTEP  8.10 - proc_xbus_scominit
18.97508|ISTEP  8.11 - proc_xbus_enable_ridi
18.99465|ISTEP  8.12 - host_set_voltages
19.08378|ISTEP  9. 1 - fabric_erepair
19.25325|ISTEP  9. 2 - fabric_io_dccal
20.01976|ISTEP  9. 3 - fabric_pre_trainadv
20.04370|ISTEP  9. 4 - fabric_io_run_training
20.20031|ISTEP  9. 5 - fabric_post_trainadv
20.20406|ISTEP  9. 6 - proc_smp_link_layer
20.20861|ISTEP  9. 7 - proc_fab_iovalid
20.49515|ISTEP  9. 8 - host_fbc_eff_config_aggregate
20.52556|ISTEP 10. 1 - proc_build_smp
21.60404|ISTEP 10. 2 - host_slave_sbe_update
22.65219|ISTEP 10. 4 - proc_cen_ref_clk_enable
22.70371|ISTEP 10. 5 - proc_enable_osclite
22.70484|ISTEP 10. 6 - proc_chiplet_scominit
22.78700|ISTEP 10. 7 - proc_abus_scominit
22.81221|ISTEP 10. 8 - proc_obus_scominit
22.81445|ISTEP 10. 9 - proc_npu_scominit
22.83694|ISTEP 10.10 - proc_pcie_scominit
22.92795|ISTEP 10.11 - proc_scomoverride_chiplets
22.93975|ISTEP 10.12 - proc_chiplet_enable_ridi
22.97791|ISTEP 10.13 - host_rng_bist
23.03426|ISTEP 10.14 - host_update_redundant_tpm
23.03943|ISTEP 11. 1 - host_prd_hwreconfig
23.30043|ISTEP 11. 2 - cen_tp_chiplet_init1
23.30748|ISTEP 11. 3 - cen_pll_initf
23.31247|ISTEP 11. 4 - cen_pll_setup
23.31704|ISTEP 11. 5 - cen_tp_chiplet_init2
23.34176|ISTEP 11. 6 - cen_tp_arrayinit
23.34678|ISTEP 11. 7 - cen_tp_chiplet_init3
23.35156|ISTEP 11. 8 - cen_chiplet_init
23.35625|ISTEP 11. 9 - cen_arrayinit
23.36093|ISTEP 11.10 - cen_initf
23.36704|ISTEP 11.11 - cen_do_manual_inits
23.37181|ISTEP 11.12 - cen_startclocks
23.37685|ISTEP 11.13 - cen_scominits
23.38151|ISTEP 12. 1 - mss_getecid
24.21273|ISTEP 12. 2 - dmi_attr_update
24.23695|ISTEP 12. 3 - proc_dmi_scominit
24.29667|ISTEP 12. 4 - cen_dmi_scominit
24.30138|ISTEP 12. 5 - dmi_erepair
24.36646|ISTEP 12. 6 - dmi_io_dccal
24.37048|ISTEP 12. 7 - dmi_pre_trainadv
24.37525|ISTEP 12. 8 - dmi_io_run_training
24.39397|ISTEP 12. 9 - dmi_post_trainadv
24.39866|ISTEP 12.10 - proc_cen_framelock
24.40360|ISTEP 12.11 - host_startprd_dmi
24.40765|ISTEP 12.12 - host_attnlisten_memb
24.41171|ISTEP 12.13 - cen_set_inband_addr
24.41584|ISTEP 13. 1 - host_disable_memvolt
24.58138|ISTEP 13. 2 - mem_pll_reset
24.62217|ISTEP 13. 3 - mem_pll_initf
24.66147|ISTEP 13. 4 - mem_pll_setup
24.69804|ISTEP 13. 6 - mem_startclocks
24.71370|ISTEP 13. 7 - host_enable_memvolt
24.73321|ISTEP 13. 8 - mss_scominit
25.30592|ISTEP 13. 9 - mss_ddr_phy_reset
25.40047|ISTEP 13.10 - mss_draminit
25.98234|ISTEP 13.11 - mss_draminit_training
28.04184|ISTEP 13.12 - mss_draminit_trainadv
28.28842|ISTEP 13.13 - mss_draminit_mc
28.32749|ISTEP 14. 1 - mss_memdiag
33.56260|ISTEP 14. 2 - mss_thermal_init
33.62293|ISTEP 14. 3 - proc_pcie_config
33.67115|ISTEP 14. 4 - mss_power_cleanup
33.67716|ISTEP 14. 5 - proc_setup_bars
33.71607|ISTEP 14. 6 - proc_htm_setup
33.72956|ISTEP 14. 7 - proc_exit_cache_contained
33.76993|ISTEP 15. 1 - host_build_stop_image
37.33691|ISTEP 15. 2 - proc_set_pba_homer_bar
37.40133|ISTEP 15. 3 - host_establish_ex_chiplet
37.43321|ISTEP 15. 4 - host_start_stop_engine
37.46371|ISTEP 16. 1 - host_activate_master
38.70238|ISTEP 16. 2 - host_activate_slave_cores
38.89911|ISTEP 16. 3 - host_secure_rng
38.88787|ISTEP 16. 4 - mss_scrub
38.90828|ISTEP 16. 5 - host_load_io_ppe
38.94111|ISTEP 16. 6 - host_ipl_complete
39.32079|ISTEP 18.11 - proc_tod_setup
39.44248|ISTEP 18.12 - proc_tod_init
39.43995|ISTEP 20. 1 - host_load_payload
40.13808|ISTEP 20. 2 - host_load_hdat
41.63064|ISTEP 21. 1 - host_runtime_setup
53.10080|htmgt|OCCs are now running in ACTIVE state
58.33571|ISTEP 21. 2 - host_verify_hdat
58.37178|ISTEP 21. 3 - host_start_payload
[   59.225278060,5] OPAL skiboot-9858186 starting...
[   59.225281070,7] initial console log level: memory 7, driver 5
[   59.225283107,6] CPU: P9 generation processor (max 4 threads/core)
[   59.225284921,7] CPU: Boot CPU PIR is 0x0834 PVR is 0x004e1203
[   59.225287534,7] OPAL table: 0x30103830 .. 0x30103e10, branch table: 0x30002000
[   59.225290544,7] Assigning physical memory map table for nimbus
[   59.225293297,7] Parsing HDAT...
[   59.225294624,7] SPIRA-S found.
[   59.225296926,6] BMC #0: HW version 3, SW version 2, chip DD1.0
[   59.225457609,6] SP Family is ibm,ast2500,openbmc
[   59.225463830,7] LPC: IOPATH chip id = 0
[   59.225465150,7] LPC: FW BAR       = f0000000
[   59.225466678,7] LPC: MEM BAR      = e0000000
[   59.225468142,7] LPC: IO BAR       = d0010000
[   59.225469592,7] LPC: Internal BAR = c0012000
[   59.225482159,7] LPC UART: base addr = 3f8 (3f8) size = 1 clk = 1843200, baud = 115200
[   59.225484833,7] LPC: BT [0, 0] sms_int: 0, bmc_int: 0
[   59.227438048,5] HDAT I2C: found e3p1 - unknown@1c dp:ff (ff:)
[   59.227553256,5] HDAT I2C: found e3p1 - unknown@1d dp:ff (ff:)
[   59.227606613,5] HDAT I2C: found e3p0 - unknown@19 dp:ff (ff:)
[   59.227659260,5] HDAT I2C: found e3p1 - unknown@1e dp:ff (ff:)
[   59.227704460,5] HDAT I2C: found e3p0 - unknown@1b dp:ff (ff:)
[   59.227754386,5] HDAT I2C: found e3p1 - unknown@1f dp:ff (ff:)
[   59.227819475,5] HDAT I2C: found e3p0 - unknown@1a dp:ff (ff:)
[   59.227898145,5] HDAT I2C: found e3p0 - unknown@18 dp:ff (ff:)
[   59.228269121,5] HDAT I2C: found e3p1 - unknown@1c dp:ff (ff:)
[   59.228347500,5] HDAT I2C: found e3p1 - unknown@1d dp:ff (ff:)
[   59.228398443,5] HDAT I2C: found e3p0 - unknown@19 dp:ff (ff:)
Petitboot (0ed84c0-p94177c1)                         T2P9D01 REV 1.00 A1000645
──────────────────────────────────────────────────────────────────────────────
 

  System information
  System configuration
  System status log
  Language
  Rescan devices
  Retrieve config from URL
  Plugins (0)
*Exit to shell         
 

 

 

 

 

 

 

 

 

 

──────────────────────────────────────────────────────────────────────────────
Enter=accept, e=edit, n=new, x=exit, l=language, g=log, h=help
[enP4p1s0f1] Probing from base tftp://192.168.0.1/pxelinux.cfg/

At this point I wait for the Debian installer (in the USB at the back of the device) to show up, and then select "Expert Installation". After that it prints that a SIGTERM is received, and it spits out the following.

Code: [Select]
cpu 0x3b: Vector: 380 (Data Access Out of Range) at [c000201fdd29b620]
    pc: c0000000001d05f0: __free_pages+0x10/0x50
    lr: c000000000123c24: dma_direct_free_pages+0x54/0x90
    sp: c000201fdd29b8b0
   msr: 900000000280b033
   dar: c04240000000dcb4
  current = 0xc000201fdd244200
  paca    = 0xc000201fff704f00   irqmask: 0x03   irq_happened: 0x01
    pid   = 1002, comm = init
Linux version 5.5.0-openpower1 (root@raptor-build-public-staging-01) (gcc version 6.5.0 (Buildroot 2019.05.3-06769-g7bdd570165)) #2 SMP Thu Feb 20 02:19:47 UTC 2020
enter ? for help
[c000201fdd29b8b0] c000000000123c24 dma_direct_free_pages+0x54/0x90 (unreliable)
[c000201fdd29b8d0] c000000000038728 dma_iommu_free_coherent+0x98/0xc0
[c000201fdd29b920] c000000000123020 dma_free_attrs+0x100/0x110
[c000201fdd29b970] c0000000001d9bf4 dma_pool_destroy+0x174/0x200
[c000201fdd29ba10] c00800000e1417e8 _base_release_memory_pools+0x1e0/0x498 [mpt3sas]
[c000201fdd29baa0] c00800000e14b428 mpt3sas_base_detach+0x40/0x160 [mpt3sas]
[c000201fdd29bb10] c00800000e15bb5c scsih_shutdown+0xc4/0x110 [mpt3sas]
[c000201fdd29bb70] c0000000003cda10 pci_device_shutdown+0x50/0xc0
[c000201fdd29bba0] c00000000064a908 device_shutdown+0x1f8/0x330
[c000201fdd29bc40] c0000000000cfe2c kernel_restart_prepare+0x4c/0x60
[c000201fdd29bc60] c0000000000cff50 kernel_restart+0x20/0xc0
[c000201fdd29bcd0] c0000000000d0370 __do_sys_reboot+0x1b0/0x2c0
[c000201fdd29be20] c00000000000b50c system_call+0x5c/0x68
--- Exception: c01 (System Call) at 00007fff89bd10a4
SP (7ffff94afef0) is in userspace

What jumps out is the following:

Code: [Select]
  8.91332|HWAS|Applying GARD record for HUID=0x00030000 (Physical:/Sys0/Node0/DIMM0) due to 0x90000010
  8.91664|HWAS|Applying GARD record for HUID=0x00030000 (Physical:/Sys0/Node0/DIMM0) due to 0x90000010

What is this about?

The box consists of:

  • Talos™ II Mainboard (Board Only)
  • 2U Heatsink Assembly for POWER9 CPUs
  • LSI 9300-8i 8-port Internal SAS 3.0 HBA
  • M393A4K40BB2-CTD - Samsung 1x 32GB DDR4-2666 RDIMM PC4-21300V-R Dual Rank x4 Module (x8 = 256GB Memory)
  • Supermicro CSE-836BE1C-R1K03B Server Chassis 3U Rackmount
  • 3TB SAS3 drives (x16)

From the boot sequence, I can see it detects the SAS3 hard disks as well as the memory.

Any suggestions would be appreciated. I can't find much on the internet about the error and why it may occur. And I can't find any documentation of commands to run when the initial IPL is complete but before attempting to install the operating system. Is there any? Where can I learn more about

I did create a ticket (#367172) with support, but given that they usually take weeks to respond so I don't have much faith that I'll have this resolved anytime soon through that channel. 

madscientist159

  • Raptor Staff
  • *****
  • Posts: 47
  • Karma: +11/-0
    • View Profile
Re: Resetting Talos II to "Factory Settings"
« Reply #5 on: April 22, 2020, 08:35:22 pm »
Apologies for not seeing this sooner.  Our (US-based) consumer support is impacted by the COVID19 situation at the moment, we are currently focusing on business sales/support given the obvious slowdown in consumer interest with the ongoing pandemic, but once that is finally under control we should be able to resume proper consumer support with a much shorter delay.  The biggest problem is no one really knows the timing yet; it could be months or even more 2021 timeframe depending on exactly what happens with the fall "flu" season this year, and that makes most decisions in this area difficult at best.

In regards to the technical problem you have, please try the PNOR here:
https://wiki.raptorcs.com/wiki/Talos_II/Firmware/Public_Beta

The backtrace looks a lot like the crash seen due to an unfortunate last-minute Linux kernel bug affecting the LSI SAS controllers.  The new PNOR applies the upstream fix for it to the kernel underneath Petitboot.