Raptor Computing Systems Community Forums (BETA)
General OpenPOWER Hardware => General CPU Discussion => Topic started by: tle on June 13, 2024, 01:34:56 am
-
Use https://github.com/geerlingguy/top500-benchmark
$ lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Model name: POWER9, altivec supported
Model: 2.3 (pvr 004e 1203)
Thread(s) per core: 4
Core(s) per socket: 8
Socket(s): 1
Frequency boost: enabled
CPU(s) scaling MHz: 100%
CPU max MHz: 3800.0000
CPU min MHz: 2166.0000
Caches (sum of all):
L1d: 256 KiB (8 instances)
L1i: 256 KiB (8 instances)
L2: 4 MiB (8 instances)
L3: 80 MiB (8 instances)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-31
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Mitigation; RFI Flush, L1D private per thread
Mds: Not affected
Meltdown: Mitigation; RFI Flush, L1D private per thread
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Kernel entry/exit barrier (eieio)
Spectre v1: Mitigation; __user pointer sanitization, ori31 speculation barrier enabled
Spectre v2: Mitigation; Software count cache flush (hardware accelerated), Software link stack flush
Srbds: Not affected
Tsx async abort: Not affected
$ ansible-playbook main.yml --tags "setup,benchmark" --ask-become-pass
mpirun_output.stdout: |-
================================================================================
HPLinpack 2.3 -- High-Performance Linpack benchmark -- December 2, 2018
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 70717
NB : 256
PMAP : Row-major process mapping
P : 1
Q : 32
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR11C2R4 70717 256 1 32 1650.43 1.4286e+02
HPL_pdgesv() start time Thu Jun 13 15:57:05 2024
HPL_pdgesv() end time Thu Jun 13 16:24:36 2024
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 2.41238455e-03 ...... PASSED
================================================================================
Finished 1 tests with the following results:
1 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------
End of Tests.
================================================================================
PLAY RECAP *********************************************************************************************************************************************************************************************************************************************************************
127.0.0.1 : ok=29 changed=10 unreachable=0 failed=0 skipped=7 rescued=0 ignored=0
-
After messing with the parameters a bit, I can get my 16 core blackbird up to 2.1683e+02, I wonder if it's limited by RAM bandwidth (only the two channels).
-
After messing with the parameters a bit, I can get my 16 core blackbird up to 2.1683e+02, I wonder if it's limited by RAM bandwidth (only the two channels).
What is your `P` and `Q` parameters?