Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - ClassicHasClass

Pages: 1 [2] 3 4 ... 34
16
General CPU Discussion / Re: Performance of HPT vs Radix?
« on: February 21, 2025, 06:37:52 pm »
This may be helpful: https://www.researchgate.net/publication/325937212_IBM_POWER9_system_software

"The HPT has the advantage that a translation can be performed (i.e., a translation lookaside buffer [TLB] miss can be serviced) by reading up to two cache lines from memory. This characteristic should enable the HPT to provide good performance for applications with very large memory footprints and low locality of references (i.e., with essentially random or quasi-random access patterns).

"The disadvantage of the HPT structure is that it does not cache well. The hashing algorithm used in the POWER Memory Management Unit (MMU) tends to put each page table entry (PTE) for a process into a separate cache line. Thus, most TLB misses cause a cache miss and need to read main memory, particularly in processes with large working sets. ...

"Because radix page tables place information about adjacent addresses into adjacent doublewords in memory, a program with high locality of references in its address access pattern will also have high locality of references in the pattern of accesses performed by the MMU to service TLB misses. Thus, the CPU caches work efficiently to cache the page-table entries (PTEs), and so radix tree translation is expected to be more efficient than HPT translation for workloads with high locality of references."

IME, for many workloads, the difference between HPT and radix MMU is imperceptible, and it does have some impact on what you can virtualize. On the other hand, for large working sets typical of IBM midrange and "big iron," radix does have real benefits.


17
Very nice work. Thank you for posting that.

18
Operating Systems and Porting / Re: powerpc equivalent to x86 intrinsics?
« on: January 30, 2025, 09:54:51 pm »
The semantics are not exact, but _mm_lfence() basically holds until all prior local loads have completed (approximately a load fence), _mm_sfence() holds until all prior stores have completed (approximately a store fence), and _mm_mfence() is effectively a complete memory fence. The GCC __builtin_ia32_lfence built-in is effectively LFENCE.

There are no precise Power equivalents. If you're in doubt and you can afford the hit, something like __asm__("sync; isync\n") should always work as a substitute for any x86 fence instruction in any situation, but is the slowest option. This combines an instruction sync with a memory sync, forcing all instructions prior to have completed and committed their results to memory, making a consistent result visible to other threads, and all succeeding instructions will execute in that context.

That said, a plain __asm__("sync\n") or its synonym hwsync may be sufficient to replace any of these fences. Note that it doesn't discard any prefetched instructions, so it's possible such instructions may still run in the old context. In most cases __asm__("lwsync\n") will also work, and is lighter still; it won't work for certain weird cache situations but shouldn't affect user programs.

Another option is the eieio instruction (the best-named instruction in the ISA, right up there with xxlxor), which is intended for memory-mapped I/O. This makes pending loads and stores run in order. That's not exactly a memory fence but it can act like one and is also pretty quick.

I'd start with replacing them with the heavyweight version, making sure it works, and then seeing what you can get away with. x86's strong memory model can sometimes make things difficult for RISC, especially multi-core/multi-processor systems.

19
General OpenPOWER Discussion / Re: POWER11 on the horizon?
« on: January 24, 2025, 04:45:29 pm »
The OMI part, for sure, and we're all hoping we get direct-attach RAM out of S1 (or there is an option for direct-attach RAM). But I hope they got the Synopsys IP out of there too.

20
Mod Zone / Re: Squeezing out more USB I/O
« on: January 21, 2025, 11:27:45 pm »
Simple and effective.

21
Talos II / Re: SMART repeatedly reporting temperature errors for NVMe SSD
« on: January 15, 2025, 11:19:07 am »
The 2Us should be nearly identical to the towers except for the orientation, and I haven't observed that in my T2 tower, though the side panel is usually slightly open for me to check on the interior. I don't know what the case fans are like the servers, though; this came with the standard Supermicro SC747 case fans. I even ended up disabling one to reduce the noise but the cooling is still fine.

22
Firmware / Re: What editor is available on the BMC?
« on: January 12, 2025, 11:44:03 pm »
It has vi, which is part of Busybox.

23
Blackbird / Re: BMC Fails to Boot
« on: January 12, 2025, 11:42:53 pm »
Yes, it should be the same. See https://www.talospace.com/2020/04/what-to-do-when-bmc-wont-talk-to-you.html for some notes on using it over serial.

24
General OpenPOWER Discussion / Re: POWER11 on the horizon?
« on: December 18, 2024, 08:36:50 pm »
I haven't seen nor heard anything myself.

25
Operating Systems and Porting / Re: [NEWS] Fedora 41 is out!
« on: December 07, 2024, 11:58:51 am »
As a contrary point, I've got F41 running fine on both the Blackbird and T2 here, and didn't have even many teething problems with the update. I wonder if a hardware glitch occurred at around the same time - Fedora doesn't do anything with the system firmware that I'm aware of. I agree with atomicdog that the pflash method is more reliable if you're concerned the BMC or PNOR firmware glitched.

26
I don't think this would work as written for the original PowerPC Q3VM, which was big-endian (little-endian PowerPC wasn't really a thing in those days). There may also be a few 32-64 bit edges, though I would think that would be minor.

27
Well that explains why certain interactive pages have been erroring "WebAssembly not defined" when checking the developer console.

I just chalked it up to my user.js template disabling WASM.

Although it may be possible to use some sort of Wasm interpreter or polyfill, obviously the best solution is to get the actual JIT working, of course. Otherwise, yes, there is no Wasm if there is no JIT.

28
It was working for awhile in internal test builds but changes to Firefox have caused it to crash on Wasm startup, deep within the code. I'm continuing to struggle with this and I'm trying to work on a better debugging setup as time permits.

A similar thing happens with Ion (second-stage) compilation, so they are probably related. Right now the JIT works but is limited to Baseline (first-stage) JS compilation and irregexp.

I haven't updated the patches for anything later than 128ESR yet but it's possible it may apply directly to more recent versions.

29
Talos II / Re: AST disable jumper doesn't disable AST video
« on: November 30, 2024, 02:20:21 pm »
My T2 is nearly exactly the same setup, with the Raptor BTO WX7100. I have the disable jumper on and firmware in BOOTKERNFW. On this machine, the ASPEED video is indeed still active, but it's showing the last stage of Hostboot; my understanding is this is deliberate so you can still monitor boot messages before the GPU is activated. However, after Hostboot completes, Petitboot never shows up on the ASPEED output, only on the WX7100.

Out of curiosity, what does lshw say when you're in the operating system? On this system BMC video is listed as unclaimed (Fedora Linux 41), which is what I would expect.

Code: [Select]
     *-pci:5
          description: PCI bridge
          product: POWER9 Host Bridge (PHB4)
          vendor: IBM
          physical id: 105
          bus info: pci@0005:00:00.0
          version: 00
          slot: UOPWR.A100059-Node0-BMC
          width: 32 bits
          clock: 33MHz
          capabilities: pci normal_decode bus_master cap_list
          resources: memory:600c280000000-600c2ffefffff
        *-pci
             description: PCI bridge
             product: AST1150 PCI-to-PCI Bridge
             vendor: ASPEED Technology, Inc.
             physical id: 0
             bus info: pci@0005:01:00.0
             version: 04
             slot: BMC
             width: 32 bits
             clock: 33MHz
             capabilities: pci normal_decode bus_master cap_list
             resources: memory:600c280000000-600c2ffefffff
           *-multimedia UNCLAIMED
                description: Multimedia video controller
                product: ASPEED Graphics Family
                vendor: ASPEED Technology, Inc.
                physical id: 0
                bus info: pci@0005:02:00.0
                version: 41
                slot: BMC
                width: 32 bits
                clock: 33MHz
                capabilities: cap_list
                configuration: latency=0
                resources: memory:600c280000000-600c280ffffff memory:600c281000000-600c28101ffff

And for the record,

Code: [Select]
root@tim-bmc:~# i2cget -y 12 0x31 0x00
0x0c
root@tim-bmc:~# i2cget -y 12 0x31 0x07
0x63

This seems later than yours.

30
Just curious: not possible to run big on both host and guest? I'm not sure why that's happening to you, but I'm curious to see if it's the mismatch that's the problem, or running big that's the problem.

Pages: 1 [2] 3 4 ... 34