Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - bloudraak

Pages: [1]
1
Talos II / Re: Resetting Talos II to "Factory Settings"
« on: April 18, 2020, 03:48:16 pm »
I wasn't able to use it. Just getting "command not found". That said, I think the issue is something different. The error message I ended up getting never displays on the monitor, only the serial cable.

When attempting to install the OS, I get the following error after the reboot.

Code: [Select]
cpu 0x3b: Vector: 380 (Data Access Out of Range) at [c000201fdd29b620]
    pc: c0000000001d05f0: __free_pages+0x10/0x50
    lr: c000000000123c24: dma_direct_free_pages+0x54/0x90
    sp: c000201fdd29b8b0
   msr: 900000000280b033
   dar: c04240000000dcb4
  current = 0xc000201fdd244200
  paca    = 0xc000201fff704f00   irqmask: 0x03   irq_happened: 0x01
    pid   = 1002, comm = init
Linux version 5.5.0-openpower1 (root@raptor-build-public-staging-01) (gcc version 6.5.0 (Buildroot 2019.05.3-06769-g7bdd570165)) #2 SMP Thu Feb 20 02:19:47 UTC 2020
enter ? for help
[c000201fdd29b8b0] c000000000123c24 dma_direct_free_pages+0x54/0x90 (unreliable)
[c000201fdd29b8d0] c000000000038728 dma_iommu_free_coherent+0x98/0xc0
[c000201fdd29b920] c000000000123020 dma_free_attrs+0x100/0x110
[c000201fdd29b970] c0000000001d9bf4 dma_pool_destroy+0x174/0x200
[c000201fdd29ba10] c00800000e1417e8 _base_release_memory_pools+0x1e0/0x498 [mpt3sas]
[c000201fdd29baa0] c00800000e14b428 mpt3sas_base_detach+0x40/0x160 [mpt3sas]
[c000201fdd29bb10] c00800000e15bb5c scsih_shutdown+0xc4/0x110 [mpt3sas]
[c000201fdd29bb70] c0000000003cda10 pci_device_shutdown+0x50/0xc0
[c000201fdd29bba0] c00000000064a908 device_shutdown+0x1f8/0x330
[c000201fdd29bc40] c0000000000cfe2c kernel_restart_prepare+0x4c/0x60
[c000201fdd29bc60] c0000000000cff50 kernel_restart+0x20/0xc0
[c000201fdd29bcd0] c0000000000d0370 __do_sys_reboot+0x1b0/0x2c0
[c000201fdd29be20] c00000000000b50c system_call+0x5c/0x68
--- Exception: c01 (System Call) at 00007fff89bd10a4
SP (7ffff94afef0) is in userspace

When restarting the server

Code: [Select]
[520051.754983] configfs-gadget gadget: high-speed config #1: c
[520053.170880] occ-hwmon occ-hwmon.1: OCC found, code level: op_occ_191023a
[520053.414668] occ-hwmon occ-hwmon.2: OCC found, code level: op_occ_191023a
[Disconnected]
[Connected]
 

 

--== Welcome to Hostboot hostboot-a2ddbf3/hbicore.bin ==--
 

  3.10730|secure|SecureROM valid - enabling functionality
  5.51749|Booting from SBE side 0 on master proc=00050000
  5.55317|ISTEP  6. 5 - host_init_fsi
  5.73355|ISTEP  6. 6 - host_set_ipl_parms
  6.03033|ISTEP  6. 7 - host_discover_targets
  7.79252|HWAS|PRESENT> DIMM[03]=A0A0A0A000000000
  7.79254|HWAS|PRESENT> Proc[05]=8800000000000000
  7.79255|HWAS|PRESENT> Core[07]=5645406555000000
  7.91166|ISTEP  6. 8 - host_update_master_tpm
  8.41908|SECURE|Security Access Bit> 0x0000000000000000
  8.41909|SECURE|Secure Mode Disable (via Jumper)> 0xC000000000000000
  8.41928|ISTEP  6. 9 - host_gard
 8.91332|HWAS|Applying GARD record for HUID=0x00030000 (Physical:/Sys0/Node0/DIMM0) due to 0x90000010
  8.91664|HWAS|Applying GARD record for HUID=0x00030000 (Physical:/Sys0/Node0/DIMM0) due to 0x90000010
  8.91676|HWAS|Deconfig HUID 0x00030000, Physical:/Sys0/Node0/DIMM0
  8.91704|HWAS|FUNCTIONAL> DIMM[03]=20A0A0A000000000
  8.91705|HWAS|FUNCTIONAL> Proc[05]=8800000000000000
  8.91707|HWAS|FUNCTIONAL> Core[07]=5645406555000000
  8.92233|ISTEP  6.11 - host_start_occ_xstop_handler
10.14337|ISTEP  6.12 - host_voltage_config
10.27450|ISTEP  7. 1 - mss_attr_cleanup
10.95124|ISTEP  7. 2 - mss_volt
11.19280|ISTEP  7. 3 - mss_freq
11.56893|ISTEP  7. 4 - mss_eff_config
12.21124|ISTEP  7. 5 - mss_attr_update
12.23273|ISTEP  8. 1 - host_slave_sbe_config
12.44698|ISTEP  8. 2 - host_setup_sbe
12.44873|ISTEP  8. 3 - host_cbs_start
12.49990|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
17.55574|ISTEP  8. 5 - host_attnlisten_proc
17.59473|ISTEP  8. 6 - host_p9_fbc_eff_config
17.60271|ISTEP  8. 7 - host_p9_eff_config_links
17.64173|ISTEP  8. 8 - proc_attr_update
17.64332|ISTEP  8. 9 - proc_chiplet_fabric_scominit
17.68790|ISTEP  8.10 - proc_xbus_scominit
18.97508|ISTEP  8.11 - proc_xbus_enable_ridi
18.99465|ISTEP  8.12 - host_set_voltages
19.08378|ISTEP  9. 1 - fabric_erepair
19.25325|ISTEP  9. 2 - fabric_io_dccal
20.01976|ISTEP  9. 3 - fabric_pre_trainadv
20.04370|ISTEP  9. 4 - fabric_io_run_training
20.20031|ISTEP  9. 5 - fabric_post_trainadv
20.20406|ISTEP  9. 6 - proc_smp_link_layer
20.20861|ISTEP  9. 7 - proc_fab_iovalid
20.49515|ISTEP  9. 8 - host_fbc_eff_config_aggregate
20.52556|ISTEP 10. 1 - proc_build_smp
21.60404|ISTEP 10. 2 - host_slave_sbe_update
22.65219|ISTEP 10. 4 - proc_cen_ref_clk_enable
22.70371|ISTEP 10. 5 - proc_enable_osclite
22.70484|ISTEP 10. 6 - proc_chiplet_scominit
22.78700|ISTEP 10. 7 - proc_abus_scominit
22.81221|ISTEP 10. 8 - proc_obus_scominit
22.81445|ISTEP 10. 9 - proc_npu_scominit
22.83694|ISTEP 10.10 - proc_pcie_scominit
22.92795|ISTEP 10.11 - proc_scomoverride_chiplets
22.93975|ISTEP 10.12 - proc_chiplet_enable_ridi
22.97791|ISTEP 10.13 - host_rng_bist
23.03426|ISTEP 10.14 - host_update_redundant_tpm
23.03943|ISTEP 11. 1 - host_prd_hwreconfig
23.30043|ISTEP 11. 2 - cen_tp_chiplet_init1
23.30748|ISTEP 11. 3 - cen_pll_initf
23.31247|ISTEP 11. 4 - cen_pll_setup
23.31704|ISTEP 11. 5 - cen_tp_chiplet_init2
23.34176|ISTEP 11. 6 - cen_tp_arrayinit
23.34678|ISTEP 11. 7 - cen_tp_chiplet_init3
23.35156|ISTEP 11. 8 - cen_chiplet_init
23.35625|ISTEP 11. 9 - cen_arrayinit
23.36093|ISTEP 11.10 - cen_initf
23.36704|ISTEP 11.11 - cen_do_manual_inits
23.37181|ISTEP 11.12 - cen_startclocks
23.37685|ISTEP 11.13 - cen_scominits
23.38151|ISTEP 12. 1 - mss_getecid
24.21273|ISTEP 12. 2 - dmi_attr_update
24.23695|ISTEP 12. 3 - proc_dmi_scominit
24.29667|ISTEP 12. 4 - cen_dmi_scominit
24.30138|ISTEP 12. 5 - dmi_erepair
24.36646|ISTEP 12. 6 - dmi_io_dccal
24.37048|ISTEP 12. 7 - dmi_pre_trainadv
24.37525|ISTEP 12. 8 - dmi_io_run_training
24.39397|ISTEP 12. 9 - dmi_post_trainadv
24.39866|ISTEP 12.10 - proc_cen_framelock
24.40360|ISTEP 12.11 - host_startprd_dmi
24.40765|ISTEP 12.12 - host_attnlisten_memb
24.41171|ISTEP 12.13 - cen_set_inband_addr
24.41584|ISTEP 13. 1 - host_disable_memvolt
24.58138|ISTEP 13. 2 - mem_pll_reset
24.62217|ISTEP 13. 3 - mem_pll_initf
24.66147|ISTEP 13. 4 - mem_pll_setup
24.69804|ISTEP 13. 6 - mem_startclocks
24.71370|ISTEP 13. 7 - host_enable_memvolt
24.73321|ISTEP 13. 8 - mss_scominit
25.30592|ISTEP 13. 9 - mss_ddr_phy_reset
25.40047|ISTEP 13.10 - mss_draminit
25.98234|ISTEP 13.11 - mss_draminit_training
28.04184|ISTEP 13.12 - mss_draminit_trainadv
28.28842|ISTEP 13.13 - mss_draminit_mc
28.32749|ISTEP 14. 1 - mss_memdiag
33.56260|ISTEP 14. 2 - mss_thermal_init
33.62293|ISTEP 14. 3 - proc_pcie_config
33.67115|ISTEP 14. 4 - mss_power_cleanup
33.67716|ISTEP 14. 5 - proc_setup_bars
33.71607|ISTEP 14. 6 - proc_htm_setup
33.72956|ISTEP 14. 7 - proc_exit_cache_contained
33.76993|ISTEP 15. 1 - host_build_stop_image
37.33691|ISTEP 15. 2 - proc_set_pba_homer_bar
37.40133|ISTEP 15. 3 - host_establish_ex_chiplet
37.43321|ISTEP 15. 4 - host_start_stop_engine
37.46371|ISTEP 16. 1 - host_activate_master
38.70238|ISTEP 16. 2 - host_activate_slave_cores
38.89911|ISTEP 16. 3 - host_secure_rng
38.88787|ISTEP 16. 4 - mss_scrub
38.90828|ISTEP 16. 5 - host_load_io_ppe
38.94111|ISTEP 16. 6 - host_ipl_complete
39.32079|ISTEP 18.11 - proc_tod_setup
39.44248|ISTEP 18.12 - proc_tod_init
39.43995|ISTEP 20. 1 - host_load_payload
40.13808|ISTEP 20. 2 - host_load_hdat
41.63064|ISTEP 21. 1 - host_runtime_setup
53.10080|htmgt|OCCs are now running in ACTIVE state
58.33571|ISTEP 21. 2 - host_verify_hdat
58.37178|ISTEP 21. 3 - host_start_payload
[   59.225278060,5] OPAL skiboot-9858186 starting...
[   59.225281070,7] initial console log level: memory 7, driver 5
[   59.225283107,6] CPU: P9 generation processor (max 4 threads/core)
[   59.225284921,7] CPU: Boot CPU PIR is 0x0834 PVR is 0x004e1203
[   59.225287534,7] OPAL table: 0x30103830 .. 0x30103e10, branch table: 0x30002000
[   59.225290544,7] Assigning physical memory map table for nimbus
[   59.225293297,7] Parsing HDAT...
[   59.225294624,7] SPIRA-S found.
[   59.225296926,6] BMC #0: HW version 3, SW version 2, chip DD1.0
[   59.225457609,6] SP Family is ibm,ast2500,openbmc
[   59.225463830,7] LPC: IOPATH chip id = 0
[   59.225465150,7] LPC: FW BAR       = f0000000
[   59.225466678,7] LPC: MEM BAR      = e0000000
[   59.225468142,7] LPC: IO BAR       = d0010000
[   59.225469592,7] LPC: Internal BAR = c0012000
[   59.225482159,7] LPC UART: base addr = 3f8 (3f8) size = 1 clk = 1843200, baud = 115200
[   59.225484833,7] LPC: BT [0, 0] sms_int: 0, bmc_int: 0
[   59.227438048,5] HDAT I2C: found e3p1 - unknown@1c dp:ff (ff:)
[   59.227553256,5] HDAT I2C: found e3p1 - unknown@1d dp:ff (ff:)
[   59.227606613,5] HDAT I2C: found e3p0 - unknown@19 dp:ff (ff:)
[   59.227659260,5] HDAT I2C: found e3p1 - unknown@1e dp:ff (ff:)
[   59.227704460,5] HDAT I2C: found e3p0 - unknown@1b dp:ff (ff:)
[   59.227754386,5] HDAT I2C: found e3p1 - unknown@1f dp:ff (ff:)
[   59.227819475,5] HDAT I2C: found e3p0 - unknown@1a dp:ff (ff:)
[   59.227898145,5] HDAT I2C: found e3p0 - unknown@18 dp:ff (ff:)
[   59.228269121,5] HDAT I2C: found e3p1 - unknown@1c dp:ff (ff:)
[   59.228347500,5] HDAT I2C: found e3p1 - unknown@1d dp:ff (ff:)
[   59.228398443,5] HDAT I2C: found e3p0 - unknown@19 dp:ff (ff:)
Petitboot (0ed84c0-p94177c1)                         T2P9D01 REV 1.00 A1000645
──────────────────────────────────────────────────────────────────────────────
 

  System information
  System configuration
  System status log
  Language
  Rescan devices
  Retrieve config from URL
  Plugins (0)
*Exit to shell         
 

 

 

 

 

 

 

 

 

 

──────────────────────────────────────────────────────────────────────────────
Enter=accept, e=edit, n=new, x=exit, l=language, g=log, h=help
[enP4p1s0f1] Probing from base tftp://192.168.0.1/pxelinux.cfg/

At this point I wait for the Debian installer (in the USB at the back of the device) to show up, and then select "Expert Installation". After that it prints that a SIGTERM is received, and it spits out the following.

Code: [Select]
cpu 0x3b: Vector: 380 (Data Access Out of Range) at [c000201fdd29b620]
    pc: c0000000001d05f0: __free_pages+0x10/0x50
    lr: c000000000123c24: dma_direct_free_pages+0x54/0x90
    sp: c000201fdd29b8b0
   msr: 900000000280b033
   dar: c04240000000dcb4
  current = 0xc000201fdd244200
  paca    = 0xc000201fff704f00   irqmask: 0x03   irq_happened: 0x01
    pid   = 1002, comm = init
Linux version 5.5.0-openpower1 (root@raptor-build-public-staging-01) (gcc version 6.5.0 (Buildroot 2019.05.3-06769-g7bdd570165)) #2 SMP Thu Feb 20 02:19:47 UTC 2020
enter ? for help
[c000201fdd29b8b0] c000000000123c24 dma_direct_free_pages+0x54/0x90 (unreliable)
[c000201fdd29b8d0] c000000000038728 dma_iommu_free_coherent+0x98/0xc0
[c000201fdd29b920] c000000000123020 dma_free_attrs+0x100/0x110
[c000201fdd29b970] c0000000001d9bf4 dma_pool_destroy+0x174/0x200
[c000201fdd29ba10] c00800000e1417e8 _base_release_memory_pools+0x1e0/0x498 [mpt3sas]
[c000201fdd29baa0] c00800000e14b428 mpt3sas_base_detach+0x40/0x160 [mpt3sas]
[c000201fdd29bb10] c00800000e15bb5c scsih_shutdown+0xc4/0x110 [mpt3sas]
[c000201fdd29bb70] c0000000003cda10 pci_device_shutdown+0x50/0xc0
[c000201fdd29bba0] c00000000064a908 device_shutdown+0x1f8/0x330
[c000201fdd29bc40] c0000000000cfe2c kernel_restart_prepare+0x4c/0x60
[c000201fdd29bc60] c0000000000cff50 kernel_restart+0x20/0xc0
[c000201fdd29bcd0] c0000000000d0370 __do_sys_reboot+0x1b0/0x2c0
[c000201fdd29be20] c00000000000b50c system_call+0x5c/0x68
--- Exception: c01 (System Call) at 00007fff89bd10a4
SP (7ffff94afef0) is in userspace

What jumps out is the following:

Code: [Select]
  8.91332|HWAS|Applying GARD record for HUID=0x00030000 (Physical:/Sys0/Node0/DIMM0) due to 0x90000010
  8.91664|HWAS|Applying GARD record for HUID=0x00030000 (Physical:/Sys0/Node0/DIMM0) due to 0x90000010

What is this about?

The box consists of:

  • Talos™ II Mainboard (Board Only)
  • 2U Heatsink Assembly for POWER9 CPUs
  • LSI 9300-8i 8-port Internal SAS 3.0 HBA
  • M393A4K40BB2-CTD - Samsung 1x 32GB DDR4-2666 RDIMM PC4-21300V-R Dual Rank x4 Module (x8 = 256GB Memory)
  • Supermicro CSE-836BE1C-R1K03B Server Chassis 3U Rackmount
  • 3TB SAS3 drives (x16)

From the boot sequence, I can see it detects the SAS3 hard disks as well as the memory.

Any suggestions would be appreciated. I can't find much on the internet about the error and why it may occur. And I can't find any documentation of commands to run when the initial IPL is complete but before attempting to install the operating system. Is there any? Where can I learn more about

I did create a ticket (#367172) with support, but given that they usually take weeks to respond so I don't have much faith that I'll have this resolved anytime soon through that channel. 

2
Talos II / Re: Resetting Talos II to "Factory Settings"
« on: April 14, 2020, 08:56:58 pm »
I haven't installed the OS on it yet. :(

Just got the serial cables via the mail and see what it produces.

3
Talos II / Resetting Talos II to "Factory Settings"
« on: April 05, 2020, 02:20:08 pm »
Is there an easy way to reset an Talos II back to factory settings so I can go through the setup process again?

I changed the passwords per documentation but can't SSH to OpenBMC, nor use the website to login.

4
Talos II / Building a Talos II computer
« on: March 24, 2020, 02:12:31 pm »
Hello,

I have the following equipment

I'm trying to connect the various components, get Debian installed so I can start virtualize some workloads (mostly for development and testing). There seems to be a disconnect between the Supermicro chassis documentation and that of the motherboard, especially how cables are connected from chassis.

A couple of questions:
  • The backplane SAS-836EL1 comes with an 16 port extender. If I connect it to the HBA, would it be sufficient for the 16 drives?
  • Which cable is used to connect the HBA to the backplane? From the backplane documentation, I'm gathering a SFF-8484 is required but the HBA specification states it has 2 Mini-SAS HD SFF8643 internal connectors. I'm a bit lost.
  • I'm trying to figure out which backplane cables need to go where. See attached picture. I'm not seeing any codes on them and thus can't figure out which cable they are. I have not removed the backplane from the chassis.
  • I'm trying to figure out which power supply cable should be connected where on the motherboard (to be done last).
Are there any videos of folks doing a build or going through the process step by step? I really which Raptor Computing Systems can be a bit more proactive, like doing videos similar to the Dell videos, in particular with Supermicro chassis since they claim some compatibility.


Thanks,
Werner

5
Talos II / Re: Rackmount Server Chassis for Talos II
« on: February 14, 2020, 08:37:52 am »
My understanding is that the board and 2U heatsink will fit into a 2U chassis and if I had fans in the chassis to force airflow over the board and CPU, any excess heat will be disbursed and later be extracted by the rack fans.

While there is no aircon where the rack is located, the rack itself (NetShelter CX 24U) has extraction fans. The servers will also be 1U apart which should allow for more airflow and heat disbursement. 

That said, would it be better to invest in a 3U or 4U chassis? It does seem like waisted space, which could be used for a shelf (I have a ton of single board computers I need to fit in too).

6
Talos II / Rackmount Server Chassis for Talos II
« on: February 12, 2020, 09:53:48 am »
Hello,

I just purchased the Talos II board with 2xIBM POWER9 v2 CPU (8-Core) and 2U Heatsink Assemblies.  Looking for a rack mountable chassis. I had a look at https://wiki.raptorcs.com/wiki/Talos_II/Hardware_Compatibility_List, are there any other Rackmount Server Chassis folks used for Talos II?

I have been looking at variants of the Superchassis CSE-825TQ, such as:

  • SuperChassis 825TQ-R740LPB
  • SuperChassis 825TQ-600LPB

Has anyone tried these before? What are the power supply would you recommend for 2xIBM POWER9 v2 CPU (8-Core), with 8 SAS drives and 256GB memory? I only have an 15A/120V power outlet available.

Thanks,
Werner

Pages: [1]