Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - ClassicHasClass

Pages: [1] 2 3 ... 33
1
Operating Systems and Porting / Re: [NEWS] Fedora 42 Beta
« on: March 19, 2025, 09:08:19 pm »
Wayland is fine if you have a GPU. If you don't, it runs everything through llvmpipe. On my 4-core Blackbird with AST graphics it worked fine but the fans never quit running (but when I switched to X11 they throttled back down again).

2
Ugh. That's not encouraging.

3
Very nicely done.

4
Haven't heard of this yet but maybe some stuff has leaked from the NDA partners.

5
This is going to sound like a flippant response and it really isn't, but pretty much any debugger would do (I use gdb but lldb should handle it also). You could throw an __asm__("trap\n") in your code and you'll drop into the debugger at that point, from which you can single step and look at the registers as you go along.

6
Likely there was some glitch that caused one of your cores to get guarded out. You can clear this condition from the BMC: https://www.talospace.com/2020/05/the-case-of-disappearing-core.html

7
A worthy project!

8
I'm happy to advise, but I can't really do much coding for it. I'm still committed to maintaining the Firefox JIT (in at least the current state) and the DOSBox JIT and my actual coding time in front of the Talos II is increasingly limited.

That said, https://github.com/ioquake/ioq3/blob/main/code/qcommon/vm_powerpc.c looks like where you'd need to do most of the work to get ppc64le to work. ppc64 is a supported target, so you'd need to audit this for endianness and write ppc64le-specific code paths for those sections that assume big endian. That may be sufficient to get it off the ground.

9
Applications and Porting / Re: XenonRecomp
« on: March 04, 2025, 08:37:38 pm »
It probably could be hacked to work on ppc64, though keep in mind that Microsoft VMX128 != AltiVec/VMX (and certainly not VSX) even though there is some overlap.

10
Operating Systems and Porting / Re: [NEWS] Fedora 41 is out!
« on: March 04, 2025, 08:32:24 pm »
Probably, but it's always cleaner if you minimize the update delta. 41 has been working OK for me so far. That said, you could upgrade to 41 after 42 comes out, and then immediately upgrade to 42.

11
General CPU Discussion / Re: Performance of HPT vs Radix?
« on: February 21, 2025, 06:37:52 pm »
This may be helpful: https://www.researchgate.net/publication/325937212_IBM_POWER9_system_software

"The HPT has the advantage that a translation can be performed (i.e., a translation lookaside buffer [TLB] miss can be serviced) by reading up to two cache lines from memory. This characteristic should enable the HPT to provide good performance for applications with very large memory footprints and low locality of references (i.e., with essentially random or quasi-random access patterns).

"The disadvantage of the HPT structure is that it does not cache well. The hashing algorithm used in the POWER Memory Management Unit (MMU) tends to put each page table entry (PTE) for a process into a separate cache line. Thus, most TLB misses cause a cache miss and need to read main memory, particularly in processes with large working sets. ...

"Because radix page tables place information about adjacent addresses into adjacent doublewords in memory, a program with high locality of references in its address access pattern will also have high locality of references in the pattern of accesses performed by the MMU to service TLB misses. Thus, the CPU caches work efficiently to cache the page-table entries (PTEs), and so radix tree translation is expected to be more efficient than HPT translation for workloads with high locality of references."

IME, for many workloads, the difference between HPT and radix MMU is imperceptible, and it does have some impact on what you can virtualize. On the other hand, for large working sets typical of IBM midrange and "big iron," radix does have real benefits.


12
Very nice work. Thank you for posting that.

13
Operating Systems and Porting / Re: powerpc equivalent to x86 intrinsics?
« on: January 30, 2025, 09:54:51 pm »
The semantics are not exact, but _mm_lfence() basically holds until all prior local loads have completed (approximately a load fence), _mm_sfence() holds until all prior stores have completed (approximately a store fence), and _mm_mfence() is effectively a complete memory fence. The GCC __builtin_ia32_lfence built-in is effectively LFENCE.

There are no precise Power equivalents. If you're in doubt and you can afford the hit, something like __asm__("sync; isync\n") should always work as a substitute for any x86 fence instruction in any situation, but is the slowest option. This combines an instruction sync with a memory sync, forcing all instructions prior to have completed and committed their results to memory, making a consistent result visible to other threads, and all succeeding instructions will execute in that context.

That said, a plain __asm__("sync\n") or its synonym hwsync may be sufficient to replace any of these fences. Note that it doesn't discard any prefetched instructions, so it's possible such instructions may still run in the old context. In most cases __asm__("lwsync\n") will also work, and is lighter still; it won't work for certain weird cache situations but shouldn't affect user programs.

Another option is the eieio instruction (the best-named instruction in the ISA, right up there with xxlxor), which is intended for memory-mapped I/O. This makes pending loads and stores run in order. That's not exactly a memory fence but it can act like one and is also pretty quick.

I'd start with replacing them with the heavyweight version, making sure it works, and then seeing what you can get away with. x86's strong memory model can sometimes make things difficult for RISC, especially multi-core/multi-processor systems.

14
General OpenPOWER Discussion / Re: POWER11 on the horizon?
« on: January 24, 2025, 04:45:29 pm »
The OMI part, for sure, and we're all hoping we get direct-attach RAM out of S1 (or there is an option for direct-attach RAM). But I hope they got the Synopsys IP out of there too.

15
Mod Zone / Re: Squeezing out more USB I/O
« on: January 21, 2025, 11:27:45 pm »
Simple and effective.

Pages: [1] 2 3 ... 33