Starting with Linux kernel 6.7, users of the AMDGPU driver are not be able to set power limits below the recommended values advised by the AMD Engineering team on the hardware itself. The new low-power limits are intentionally enforced and set based on each card vBIOS specification.
I may be partially responsible for this lazy ass implementation.
3 months ago I was playing around with stable diffusion a lot and because I sleep in the same room where my PC is, I used to lower the TDP of the GPU during the night to 150w to keep it quiet. One day while SD was running, I lowered the TDP in LACT and pressed Apply but instead of getting quieter, the fans ramped up and I was shocked seeing that the card was in fact pulling 420w instead of its rated 293w (6900xt).
I tracked down the issue to the driver incorrectly applying the power limit, basically if you set a TDP that’s too low for the current power state, the driver would disable the power limit entirely until the card entered a lower power state, after which, your new TDP would be correctly applied.
Running a modern GPU without power limits is bad and potentially dangerous for everything involved: the GPU, the VRMs, even the power supply cables may melt as we’ve seen with nVidia cards. So I reported the issue immediately to the AMDGPU developers (my issue is linked in the article).
They quickly came up with a fix, which I tested, which wouldn’t allow you to set a TDP lower than the lowest valid TDP for the highest power state. This gets the job done but it’s a kludge more than a fix, ideally the driver should realize that the new TDP is too low for the current power state and switch to a lower power state, and I don’t know why AMD implemented such a shitty solution in their official kernel driver.
Alright everyone, time to break out the pitchforks and storm this guy’s house lol
But seriously, this is 100% on AMD, don’t beat yourself up for their laziness.