No, you don't need to be extremely careful, and POWER9 has made this easier by getting rid of dispatch groups (this used to be sensitive to how instructions were packaged up into dispatch groups and made their way through the execution pipeline). Register renaming is also a lot more powerful than it used to be, though if you can make use of more registers manually, that always helps. The biggest sources of slowdowns are inappropriate use of the link register for branching or trampolines, which will foul the return address cache, and avoidable or aliased spills to memory, which is why the FPR<->GPR moves in VSX are so great (to move a GPR to the FPR used to require a memory spill and load, which inevitably aliased if you did this on the stack, so you also needed nops). As a general rule, Power chips are also way better at straightline code than branching, even if the straightline code seems to do more work.