Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Topics - tle

Pages: 1 2 [3] 4 5 ... 12
31
General CPU Discussion / HPLinpack benchmarking
« on: June 13, 2024, 01:34:56 am »
Use https://github.com/geerlingguy/top500-benchmark

Code: [Select]
$ lscpu
Architecture:             ppc64le
  Byte Order:             Little Endian
CPU(s):                   32
  On-line CPU(s) list:    0-31
Model name:               POWER9, altivec supported
  Model:                  2.3 (pvr 004e 1203)
  Thread(s) per core:     4
  Core(s) per socket:     8
  Socket(s):              1
  Frequency boost:        enabled
  CPU(s) scaling MHz:     100%
  CPU max MHz:            3800.0000
  CPU min MHz:            2166.0000
Caches (sum of all):     
  L1d:                    256 KiB (8 instances)
  L1i:                    256 KiB (8 instances)
  L2:                     4 MiB (8 instances)
  L3:                     80 MiB (8 instances)
NUMA:                     
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-31
Vulnerabilities:         
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Mitigation; RFI Flush, L1D private per thread
  Mds:                    Not affected
  Meltdown:               Mitigation; RFI Flush, L1D private per thread
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Kernel entry/exit barrier (eieio)
  Spectre v1:             Mitigation; __user pointer sanitization, ori31 speculation barrier enabled
  Spectre v2:             Mitigation; Software count cache flush (hardware accelerated), Software link stack flush
  Srbds:                  Not affected
  Tsx async abort:        Not affected


Code: [Select]
$ ansible-playbook main.yml --tags "setup,benchmark" --ask-become-pass
  mpirun_output.stdout: |-
    ================================================================================
    HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
    Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
    Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
    Modified by Julien Langou, University of Colorado Denver
    ================================================================================
 
    An explanation of the input/output parameters follows:
    T/V    : Wall time / encoded variant.
    N      : The order of the coefficient matrix A.
    NB     : The partitioning blocking factor.
    P      : The number of process rows.
    Q      : The number of process columns.
    Time   : Time in seconds to solve the linear system.
    Gflops : Rate of execution for solving the linear system.
 
    The following parameter values will be used:
 
    N      :   70717
    NB     :     256
    PMAP   : Row-major process mapping
    P      :       1
    Q      :      32
    PFACT  :   Right
    NBMIN  :       4
    NDIV   :       2
    RFACT  :   Crout
    BCAST  :  1ringM
    DEPTH  :       1
    SWAP   : Mix (threshold = 64)
    L1     : transposed form
    U      : transposed form
    EQUIL  : yes
    ALIGN  : 8 double precision words
 
    --------------------------------------------------------------------------------
 
    - The matrix A is randomly generated for each test.
    - The following scaled residual check will be computed:
          ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
    - The relative machine precision (eps) is taken to be               1.110223e-16
    - Computational tests pass if scaled residuals are less than                16.0
 
    ================================================================================
    T/V                N    NB     P     Q               Time                 Gflops
    --------------------------------------------------------------------------------
    WR11C2R4       70717   256     1    32            1650.43             1.4286e+02
    HPL_pdgesv() start time Thu Jun 13 15:57:05 2024
 
    HPL_pdgesv() end time   Thu Jun 13 16:24:36 2024
 
    --------------------------------------------------------------------------------
    ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   2.41238455e-03 ...... PASSED
    ================================================================================
 
    Finished      1 tests with the following results:
                  1 tests completed and passed residual checks,
                  0 tests completed and failed residual checks,
                  0 tests skipped because of illegal input values.
    --------------------------------------------------------------------------------
 
    End of Tests.
    ================================================================================

PLAY RECAP *********************************************************************************************************************************************************************************************************************************************************************
127.0.0.1                  : ok=29   changed=10   unreachable=0    failed=0    skipped=7    rescued=0    ignored=0   

32
General CPU Discussion / Byte Magazine Unix benchmarking
« on: June 05, 2024, 07:32:37 am »
Let's have a bit of fun shall we? Below is my benchmark results on my Blackbird with 8 cores POWER9. What's your score?

Code: [Select]
$ lscpu
Architecture:             ppc64le
  Byte Order:             Little Endian
CPU(s):                   32
  On-line CPU(s) list:    0-31
Model name:               POWER9, altivec supported
  Model:                  2.3 (pvr 004e 1203)
  Thread(s) per core:     4
  Core(s) per socket:     8
  Socket(s):              1
  Frequency boost:        enabled
  CPU(s) scaling MHz:     58%
  CPU max MHz:            3800.0000
  CPU min MHz:            2166.0000
Caches (sum of all):     
  L1d:                    256 KiB (8 instances)
  L1i:                    256 KiB (8 instances)
  L2:                     4 MiB (8 instances)
  L3:                     80 MiB (8 instances)
NUMA:                     
  NUMA node(s):           1
  NUMA node0 CPU(s):      0-31
Vulnerabilities:         
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Mitigation; RFI Flush, L1D private per thread
  Mds:                    Not affected
  Meltdown:               Mitigation; RFI Flush, L1D private per thread
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Kernel entry/exit barrier (eieio)
  Spectre v1:             Mitigation; __user pointer sanitization, ori31 speculation b
                          arrier enabled
  Spectre v2:             Mitigation; Software count cache flush (hardware accelerated
                          ), Software link stack flush
  Srbds:                  Not affected
  Tsx async abort:        Not affected

Code: [Select]
   #    #  #    #  #  #    #          #####   ######  #    #   ####   #    #
   #    #  ##   #  #   #  #           #    #  #       ##   #  #    #  #    #
   #    #  # #  #  #    ##            #####   #####   # #  #  #       ######
   #    #  #  # #  #    ##            #    #  #       #  # #  #       #    #
   #    #  #   ##  #   #  #           #    #  #       #   ##  #    #  #    #
    ####   #    #  #  #    #          #####   ######  #    #   ####   #    #

   Version 5.1.3                      Based on the Byte Magazine Unix Benchmark

   Multi-CPU version                  Version 5 revisions by Ian Smith,
                                      Sunnyvale, CA, USA
   January 13, 2011                   johantheghost at yahoo period com

------------------------------------------------------------------------------
   Use directories for:
      * File I/O tests (named fs***) = /home/tle/Work/byte-unixbench/UnixBench/tmp
      * Results                      = /home/tle/Work/byte-unixbench/UnixBench/results
------------------------------------------------------------------------------


1 x Dhrystone 2 using register variables  1 2 3 4 5 6 7 8 9 10

1 x Double-Precision Whetstone  1 2 3 4 5 6 7 8 9 10

1 x Execl Throughput  1 2 3

1 x File Copy 1024 bufsize 2000 maxblocks  1 2 3

1 x File Copy 256 bufsize 500 maxblocks  1 2 3

1 x File Copy 4096 bufsize 8000 maxblocks  1 2 3

1 x Pipe Throughput  1 2 3 4 5 6 7 8 9 10

1 x Pipe-based Context Switching  1 2 3 4 5 6 7 8 9 10

1 x Process Creation  1 2 3

1 x System Call Overhead  1 2 3 4 5 6 7 8 9 10

1 x Shell Scripts (1 concurrent)  1 2 3

1 x Shell Scripts (8 concurrent)  1 2 3

32 x Dhrystone 2 using register variables  1 2 3 4 5 6 7 8 9 10

32 x Double-Precision Whetstone  1 2 3 4 5 6 7 8 9 10

32 x Execl Throughput  1 2 3

32 x File Copy 1024 bufsize 2000 maxblocks  1 2 3

32 x File Copy 256 bufsize 500 maxblocks  1 2 3

32 x File Copy 4096 bufsize 8000 maxblocks  1 2 3

32 x Pipe Throughput  1 2 3 4 5 6 7 8 9 10

32 x Pipe-based Context Switching  1 2 3 4 5 6 7 8 9 10

32 x Process Creation  1 2 3

32 x System Call Overhead  1 2 3 4 5 6 7 8 9 10

32 x Shell Scripts (1 concurrent)  1 2 3

32 x Shell Scripts (8 concurrent)  1 2 3

========================================================================
   BYTE UNIX Benchmarks (Version 5.1.3)

   System: shrimp-paste: GNU/Linux
   OS: GNU/Linux -- 6.8.11-300.fc40.ppc64le -- #1 SMP Mon May 27 14:48:15 UTC 2024
   Machine: ppc64le (unknown)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   20:34:42 up  7:16,  2 users,  load average: 0.17, 23.10, 33.96; runlevel 2024-06-05

------------------------------------------------------------------------
Benchmark Run: Wed Jun 05 2024 20:34:42 - 21:03:09
32 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       43066559.3 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     4835.0 MWIPS (10.0 s, 7 samples)
Execl Throughput                               3317.1 lps   (29.8 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        241162.4 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks           61272.0 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks        846105.6 KBps  (30.0 s, 2 samples)
Pipe Throughput                              779278.4 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                  41152.3 lps   (10.0 s, 7 samples)
Process Creation                               4803.7 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   4640.7 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   3796.2 lpm   (60.0 s, 2 samples)
System Call Overhead                         745761.8 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   43066559.3   3690.4
Double-Precision Whetstone                       55.0       4835.0    879.1
Execl Throughput                                 43.0       3317.1    771.4
File Copy 1024 bufsize 2000 maxblocks          3960.0     241162.4    609.0
File Copy 256 bufsize 500 maxblocks            1655.0      61272.0    370.2
File Copy 4096 bufsize 8000 maxblocks          5800.0     846105.6   1458.8
Pipe Throughput                               12440.0     779278.4    626.4
Pipe-based Context Switching                   4000.0      41152.3    102.9
Process Creation                                126.0       4803.7    381.2
Shell Scripts (1 concurrent)                     42.4       4640.7   1094.5
Shell Scripts (8 concurrent)                      6.0       3796.2   6327.0
System Call Overhead                          15000.0     745761.8    497.2
                                                                   ========
System Benchmarks Index Score                                         800.9

------------------------------------------------------------------------
Benchmark Run: Wed Jun 05 2024 21:03:09 - 21:32:30
32 CPUs in system; running 32 parallel copies of tests

Dhrystone 2 using register variables      449736205.6 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                   112382.5 MWIPS (9.8 s, 7 samples)
Execl Throughput                              36818.4 lps   (29.8 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks        878485.1 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          212868.6 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       3705686.7 KBps  (30.0 s, 2 samples)
Pipe Throughput                            10828494.4 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                1449711.4 lps   (10.0 s, 7 samples)
Process Creation                              70064.8 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  67413.9 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   8397.6 lpm   (60.1 s, 2 samples)
System Call Overhead                       13942866.4 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0  449736205.6  38537.8
Double-Precision Whetstone                       55.0     112382.5  20433.2
Execl Throughput                                 43.0      36818.4   8562.4
File Copy 1024 bufsize 2000 maxblocks          3960.0     878485.1   2218.4
File Copy 256 bufsize 500 maxblocks            1655.0     212868.6   1286.2
File Copy 4096 bufsize 8000 maxblocks          5800.0    3705686.7   6389.1
Pipe Throughput                               12440.0   10828494.4   8704.6
Pipe-based Context Switching                   4000.0    1449711.4   3624.3
Process Creation                                126.0      70064.8   5560.7
Shell Scripts (1 concurrent)                     42.4      67413.9  15899.5
Shell Scripts (8 concurrent)                      6.0       8397.6  13996.0
System Call Overhead                          15000.0   13942866.4   9295.2
                                                                   ========
System Benchmarks Index Score                                        7717.0

33
Operating Systems and Porting / [NEWS] Rocky Linux 9.4
« on: May 09, 2024, 09:31:32 pm »
Announcement source https://rockylinux.org/news/rocky-linux-9-4-ga-release

It’s good to know the Alma and Rocky Linux has continued the journey that CentOS has given up.

More options, the better!

P/S: I have not used it yet so unsure how good it is compared with to RHEL

34
More info: https://fedoramagazine.org/announcing-fedora-linux-40/

The F40 brings GNOME 46 and Linux kernel 6.8.7. Chromium RPM package is officially supported.

IMHO it is probably the best version for my blackbird till date. All sluggish gfx regression of mutter is gone, everything runs pretty snappy.

amdgpu works out of the box (well the only thing that does not work is no HDMI signal which does not impact me much as I am using DisplayPort)

There are a bit bugs with certain apps for example GNOME Loupe could not open image file.

35
Rustlang-based apps have emerged as the most trendy language/tech choice for Linux apps lately.

Because of lacking of ppc64le backend in wasmtime, many popuilar apps such as Zed text editor, DenoJS could not be ported to ppc64le. There was a report from lleabout (https://github.com/bytecodealliance/wasmtime/issues/1183) in the past but it did not produce any result yet.

If any member who could take on the work, I am more than happy to crowd source a bounty to reward for their efforts.

36
Ref: https://lists.gnu.org/archive/html/bug-hurd/2023-10/msg00021.html

I have not used this OS and unlikely would use it as main driver however it's good to know a purist OSS OS is still alive.

37
Operating Systems and Porting / [NEWS] Fedora 39 is here!
« on: November 07, 2023, 05:17:02 pm »
Ref: https://fedoramagazine.org/announcing-fedora-linux-39/

It is probably one of the best distro for ppc64le as of 2023

38
Blackbird / Need help with troubleshooting failed to boot issue
« on: November 06, 2023, 09:49:54 pm »
I posted a video outlining the issue I have with my blackbird at https://youtu.be/8r1HVoI3_8g?si=NhNTnHAdR-HJLJit

TL;DR; 2 days ago my old PSU died. So I replaced it with a new PSU
and now my motherboard refuse to power up.

Any help is greatly appreciated

39
General OpenPOWER Discussion / POWER11 on the horizon?
« on: October 19, 2023, 05:43:15 pm »
According to the article on Phoronix (ref: https://www.phoronix.com/news/GCC-PowerPC-Future-POWER11) it's seems to me the bunch of PowerPC-future patches could be that of next revision of PowerISA and likely that of POWER11.

40
The PR 3 years in the making has finally got merged into the main trunk!

Ref: https://github.com/briansmith/ring/pull/1057

The ring library is widely used by many rust-based projects. Having the upstream support ppc64le is a major milestone! I cannot wait to get DenoJS to support ppc64le


41
This is yet another source port for Quake 3 and considered one of the best Vulkan renderer port out there.



I am happy that the author has finally accepted my Pull Request to make it compile for ppc64le

https://github.com/suijingfeng/vkQuake3/pull/18

42
Operating Systems and Porting / [NEWS] Fedora 39 Beta
« on: September 19, 2023, 05:20:33 pm »
It’s that time of the year again.

https://fedoramagazine.org/announcing-fedora-39-beta/

Fedora 39 is now Beta. Please have a go at it and report and any issue in Bugzilla



43
UPDATE 31 July 2024: Fedora has adopted the patchset that Debian and other distros are using (https://gitlab.solidsilicon.io/public-development/open-source/chromium/openpower-patches). Great thanks to Than Ngo of RedHat, Timothy Pearson (Debian package maintainer)

EDIT: Than Ngo of Red Hat has cherry-picked all ppc64le patches into the main trunk. Now we officially have chromium for Fedora 40 or newer!

URL https://src.fedoraproject.org/rpms/chromium/pull-request/37 *STILL WORK IN PROGRESS*

The last time I was successful at getting Chromium up and running in Fedora is 3 years ago. That was version 88. Now Chromium has reached version 117.

This weekend I've spent a bit of time to adapt Timothy Pearson's patchset from Debian deb package for the official chromium Fedora RPM. So far I could get the whole application compiled and running HOWEVER the browser get into SIGSEGV or SIGTRAP with web pages that has JavaScript. So I speculate this might be related to the V8 engine.

The error outputs in the console is not so much helpful:

Code: [Select]
[1961086:1961086:0910/235841.808221:ERROR:CONSOLE(1)] "Uncaught SyntaxError: Invalid regular expression: /([^\s]+?)\(([\s\S]*)\)/: Maximum call stack size exceeded", source: chrome://resources/polymer/v3_0/polymer/polymer_bundled.min.js (1)
[0910/235911.891648:ERROR:check.cc(298)] Check failed: false. NOTREACHED log messages are omitted in official builds. Sorry!
[0911/000357.201831:ERROR:check.cc(298)] Check failed: false. NOTREACHED log messages are omitted in official builds. Sorry!

I also attempt to gdb and could not find any useful backtrace (NOTE: I manually commented out all `strip` symbols in the build spec file).

The full log out when trying to open a webpage https://gist.github.com/runlevel5/3c85515c521ebcfb6ca65e4697b6b1d1

Any idea how to get more logs out of V8 in Chromium?



44
Everything, I mean every single component has redundancy. Plus it runs the super fast IBM Power10 CPU!

It's surely unaffordable for us pro-consumers but hey we all can dream one day Raptor come out with a Power10 workstation/

Source: https://www.youtube.com/watch?v=7ZdsWebj9Jw

45
Operating Systems and Porting / [NEWS] Linux 6.5
« on: August 27, 2023, 06:08:10 pm »
It's finally out! https://www.phoronix.com/review/linux-65-features

Unfortunately it does not have much changes related to OpenPOWER. One change with Power10 is DEXCR Support In and that's pretty much it

Pages: 1 2 [3] 4 5 ... 12